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(57) ABSTRACT 


A method of creating a biological aging clock for a subject 
can include: (a) receiving a transcriptome signature derived 
from a tissue or organ of the subject; (b) creating input 
vectors based on the transcriptome signature; (c) inputting 
the input vectors into a machine learning platform; (d) 
generating a predicted biological aging clock of the tissue or 
organ based on the input vectors by the machine learning 
platform, wherein the biological aging clock is specific to 
the tissue or organ; and (e) preparing a report that includes 
the biological aging clock that identifies a predicted biologi- 
cal age of the tissue or organ. 
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DEEP TRANSCRIPTOMIC MARKERS OF 
HUMAN BIOLOGICAL AGING AND 
METHODS OF DETERMINING A 
BIOLOGICAL AGING CLOCK 


CROSS-REFERENCE 


This patent application claims priority to U.S. Provisional 
Application No. 62/547,061 filed Aug. 17, 2017, which 
provisional is incorporated herein by specific reference in its 
entirety. 

This patent application is also a continuation-in-part of 
U.S. application Ser. No. 16/044,784 filed Jul. 25, 2018, 
which claims priority to U.S. Provisional Application No. 
62/536,658 filed Jul. 25, 2017. 


BACKGROUND 


While aging may be a complex multifactorial process 
with no single cause or treatment, the issue whether aging 
can be classified as the disease is widely debated. Many 
strategies for extending organismal life spans have been 
proposed including replacing cells and organs, comprehen- 
sive strategies for repairing the accumulated damage, using 
hormetins to activate endogenous repair processes, modu- 
lating the aging processes through specific mutations, gene 
therapy and small molecule drugs. An animal’s survival 
strongly depends on its ability to maintain homeostasis, 
achieved partly through intracellular and intercellular com- 
munication within and among different tissues. 

Lifespan of different cells and tissues varies substantially. 
Although aging affects gene expression in multiple tissues, 
the set of genes are highly tissue specific and depend on their 
functions in the tissue. As the regeneration rates and asso- 
ciated with it gene expression patterns vary, external effec- 
tors, such as small molecules, have different effect on 
different tissues. As a result, gene expression tissue specific 
signatures could provide information for interventions that 
could bring the tissues, organ, or person back to a younger 
state without an additional adverse effects on other tissues. 

Until recently, treatments and therapies for senescence 
reversal (aging reversal) have been rare, largely because of 
the complexity of the underlying mechanisms of senescence 
and the lack of tools for understanding and treating senes- 
cence. One example of drug development for senescence 
protection (rather than senescence reversal) can be seen in 
US 2017/0073735. Recent bioinformatics developments 
such as deep neural networks have opened up the possibility 
of developing highly-personalized senescence reversal treat- 
ments, based on gene expression of senescent tissues versus 
non-senescent tissues, as will be disclosed in the present 
invention. 

Presently, none of the proposed strategies for senescence 
treatment provide a roadmap for rapid screening, validation 
and clinical deployment. No methods currently exist to 
predict the effects of currently available drugs on human 
longevity and health span in a timely manner. 

Many biomarkers of aging have been proposed including 
telomere length, intracellular and extracellular aggregates, 
racemization of the amino acids and genetic instability. Gene 
expression and DNA methylation profiles change during 
aging also may be used as biomarkers of aging. Many 
studies analyzing transcriptomes of biopsies in a variety of 
diseases indicated that age and sex of the patient have 
significant effects on gene expression and that there are 
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noticeable changes in gene expression with age in mice, 
resulting in development of mouse aging gene expression 
databases and in humans. 

Combinations of protein-protein interaction and gene 
expression in both flies and humans demonstrate that aging 
is mainly associated with a small number of biological 
processes, which might preferentially attack key regulatory 
nodes that are important for network stability. 

Work of the inventors, among others, with gene expres- 
sion and epigenetics of various solid tumors provided clues 
that transcription profiles of cells mapped onto the signaling 
pathways may be used to screen for and rate the targeted 
drugs that regulate pathways directly and indirectly related 
to aging and longevity. Prior studies suggest that a combi- 
nation of pathways, termed pathway cloud, instead of one 
element of the pathway or the whole pathway might be 
responsible for pathological changes in the cell. 

The senescence response causes striking changes in cel- 
Iular phenotype. Aging/senescence in humans causes strik- 
ing changes in cellular phenotype. According to (Campisi 
and d'Adda di Fagagna 2007) the senescent phenotype is 
induced by multiple stimuli. Mitotically competent cells 
respond to various stressors by undergoing cellular senes- 
cence. These stressors include dysfunctional telomeres, non- 
telomeric DNA damage, excessive mitogenic signals includ- 
ing those produced by oncogenes (which also cause DNA 
damage), non-genotoxic stress such as perturbations to 
chromatin organization and, probably, stresses with an as- 
yet unknown etiology. These changes include an essentially 
permanent arrest of cell proliferation, development of resis- 
tance to apoptosis (the death of some cells that occurs as a 
normal and controlled part of an organism's growth or 
development) and an altered pattern of gene expression. 
Also, the expression or appearance of senescence-associated 
markers such as senescence-associated ß-galactosidase, p16, 
senescence-associated DNA-damage foci (SDFs) and senes- 
cence-associated heterochromatin foci (SAHFs) are neither 
universal nor exclusive to the senescent state. 

Cellular senescence is thought to contribute to age-related 
tissue and organ dysfunction and various chronic age-related 
diseases through various mechanisms. Senescence is char- 
acterized by a persistent proliferative arrest in which cells 
display a distinct pro-inflammatory senescent-associated 
secretory phenotype (SASP) (Krimpenfort and Berns 2017). 
Whereas SASP exerts a supportive paracrine function during 
early development and wound healing (Demaria et al. 2014), 
the continuous secretion of these SASP factors has detri- 
mental effects on normal tissue homeostasis and is consid- 
ered to significantly contribute to aging (DiLoreto and 
Murphy 2015). 

In a cell-autonomous manner, senescence acts to deplete 
the various pools of cycling cells in an organism, including 
stem and progenitor cells. In this way, senescence interferes 
with tissue homeostasis and regeneration, and lays the 
groundwork for its cell-non-autonomous detrimental actions 
involving the SASP. There are at least five distinct paracrine 
mechanisms by which senescent cells are thought to pro- 
mote tissue dysfunction, including perturbation of the stem 
cell niche (causing stem cell dysfunction), disruption of 
extracellular matrix, induction of aberrant cell differentia- 
tion (both creating abnormal tissue architecture), stimulation 
of sterile tissue inflammation, and induction of senescence 
in neighboring cells (paracrine senescence). Àn emerging 
yet untested concept is that post-mitotic, terminally differ- 
entiated cells that develop key properties of senescent cells 
might contribute to ageing and age-related disease through 
the same set of paracrine mechanisms (van Deursen 2014). 
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Several recent observations support the hypothesis that 
senescence is a highly-dynamic, multi-step process, during 
which the properties of senescent cells continuously evolve 
and diversify, much like tumorigenesis but without cell 
proliferation as a driver (De Cecco et al. 2013; Wang et al. 
2011; Ivanov et al. 2013). This includes not only senescent 
cells but also take in account pre-senescent stage. This fact 
also means there is an opportunity to reverse the cell to 
normal non-senescent behavior. 

There has always been a need to reverse senescence, but 
only recently are there the necessary tools, particularly, 
developments in informatics and machine learning, to 
develop and apply such senescence therapies and treatments. 
Further, even commonly-accepted biomarkers and metric of 
such biomarkers to assess aging have been lacking. 

At least two general concepts of age exist in the art. One, 
“chronological age” is simply the actual calendar time an 
organism or human has been alive. Another one, called 
“biological age” or “physiological age”, which is a particular 
focus of the present invention, is related to the physiological 
health of the individual, and biomarkers thereof. Biological 
age is associated with how well organs and regulatory 
systems of the body are performing and at what extent the 
general homeostasis at all levels of the organism is being 
maintained, as such functions generally decline with time 
and age. 

The measurement of any physiological process of an 
organism is typically done with a set of predefined biomark- 
ers. A biomarker can be defined as a characteristic that is 
objectively measured and evaluated as an indicator of nor- 
mal biological processes, pathogenic processes, or pharma- 
cologic responses to a therapeutic intervention. Biomarkers 
are chosen by scientists in order to measure a very-well 
defined process within the body. 

Given that in a multi-cellular organism that aging is a 
systemic process, which cannot be readily captured by 
single uni-dimensional or even several metrics, the devel- 
opment of an accurate and useful measure of biological age 
(which can be thought of as a biological clock), is subject to 
specific challenges. Again, such biomarkers must not only 
be an objective quantifiable and easily measurable charac- 
teristics of the biological aging process, but must also be 
able to take into account that aging is not a single specific 
process, but rather a suite of changes across multiple physi- 
ological systems. 

In other words, no single biomarker can provide an 
accurate overall biological clock age of a multi-cellular 
organism, nor can the biological age of a single cell, tissue, 
or organ, even when composed of many biomarkers, provide 
an accurate overall biological age of an organism. And in 
fact, it is often useful to have several biological clocks 
assigned to an organism or human, that is, a different 
biological age can be assigned to different cells, tissues, or 
organs of that organism, as well as different clocks based on 
a different biomarker or different biomarker. Thus there may 
be one clock for the skin, one for the liver, one clock based 
on telomere length of a cell(s), tissue(s), or organ(s), and 
another based on a different biomarker. 

In the past, several attempts have been made to develop 
adapted biomarkers for measuring biological aging. How- 
ever, the biomarkers used so far focus on monitoring a 
restricted number of processes known for being directly 
involved in the onset and propagation of aging related 
damages through the body. Examples of such biomarkers are 
telomere length (Lehmann, 2013), intracellular and extra- 
cellular aggregates, racemization of the amino acids and 
genetic instability. Both gene expression (Wolters, 2013) 
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and DNA methylation profiles (Horvath, 2012, Horvath, 
2013, Mendelsohn, 2013) change during aging and may be 
used as biomarkers of aging as demonstrated previously 
with the epigenetic clock (Horvath, 2012, Horvath, 2013). 
Many studies analyzing transcriptomes of biopsies in a 
variety of diseases indicated that age and sex of the patient 
had significant effects on gene expression (Chowers, 2003) 
and that there are noticeable changes in gene expression 
with age in mice (Weindruch, 2002, Park, 2009), resulting in 
development of mouse aging gene expression databases 
(Zahn, 2007) and in humans (Blalock, 2003; Welle, 2003; 
Park, 2005; Hong, 2008; de Magalhäes, J. P, 2009). 


BRIEF DESCRIPTION OF THE FIGURES 


The foregoing and following information as well as other 
features of this disclosure will become more fully apparent 
from the following description and appended claims, taken 
in conjunction with the accompanying drawings. Under- 
standing that these drawings depict only several embodi- 
ments in accordance with the disclosure and are, therefore, 
not to be considered limiting of its scope, the disclosure will 
be described with additional specificity and detail through 
use of the accompanying drawings. 

FIG. 1 shows an embodiment of an age prediction pipe- 
line which is applied to patients with pre-senescent, senes- 
cent, fibrotic conditions or age-related diseases. 

FIG. 2 shows an embodiment of an age prediction pipe- 
line combined with iPANDA analysis used to select the 
personalized treatment. 

FIG. 3 illustrates the predicted age by deep transcriptomic 
clock method for biological aging assessment based on 
blood transcriptomic profiles, compatible with the current 
invention, vs actual chronological age of healthy individual 
in the validation set. 

FIG. 4 illustrates the predicted age by transcriptomic 
clock method for biological aging assessment based on 
muscle transcriptomic profiles, compatible with the current 
invention, vs actual chronological age of healthy individual 
in the validation and testing set. 

FIG. 5 illustrates the predicted age by deep transcriptomic 
clock method for biological aging assessment based on 
muscle transcriptomic profiles, compatible with the current 
invention, vs actual chronological age groups of healthy 
individual in the external validation set. 

FIG. 6 illustrates distribution on number of samples by 
age for healthy individuals in the validation set. 

FIG. 7 illustrates an example epsilon-prediction accuracy 
for healthy individuals. 

FIG. 8 illustrates clustering using t-SNE clustering algo- 
rithm by age for healthy individuals. 

FIG. 9 List of the most important genes selected by the 
Borda count algorithm applied over ranks assigned by deep 
transcriptomic clocks, compatible with the current inven- 
tion, and other machine learning models as described. 

FIG. 10 illustrates a Venn diagram showing organs, cells, 
and body fluids, and number of specific targets thereof. 

FIG. 11 illustrates the delta (difference between assigned 
(predicted) biological age and actual chronological age) bar 
plots grouped by age ranges for healthy people based on an 
exemplary validation set as described. 

FIG. 12 shows an example of a biological age clock, or a 
report thereof with a hazard ratio for different subgroups. 

FIG. 13 shows an example of a biological age clock, or a 
report thereof to compare various subgroups with actual age 
and predicted ages, and shows the delta (difference between 
assigned (predicted) biological age and actual chronological 
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age) bar plots grouped by age ranges for healthy people 
based on an exemplary validation set as described. 

FIG. 14 shows an example computing device 600 (e.g., a 
computer) that may be arranged in some embodiments to 
perform the methods (or portions thereof) described herein. 

The elements in the figures are arranged in accordance 
with at least one of the embodiments described herein, and 
which arrangement may be modified in accordance with the 
disclosure provided herein by one of ordinary skill in the art. 


DETAILED DESCRIPTION 


In the following detailed description, reference is made to 
the accompanying drawings, which form a part hereof. In 
the drawings, similar symbols typically identify similar 
components, unless context dictates otherwise. The illustra- 
tive embodiments described in the detailed description, 
drawings, and claims are not meant to be limiting. Other 
embodiments may be utilized, and other changes may be 
made, without departing from the spirit or scope of the 
subject matter presented herein. It will be readily understood 
that the aspects of the present disclosure, as generally 
described herein, and illustrated in the figures, can be 
arranged, substituted, combined, separated, and designed in 
a wide variety of different configurations, all of which are 
explicitly contemplated herein. 

Generally, the present invention relates to biomarkers of 
human biological aging. In some aspects, the invention 
relates to biomarkers based on gene expression, also called 
transcriptomic data, which provide metrics and estimates of 
the biological age of organisms, including humans. Thus, 
transcriptome aging clocks are provided based on such 
biomarkers and use thereof. Additionally, machine learning 
and deep learning techniques are utilized to assess the 
transcriptomic data and the biomarkers of human biological 
aging. The invention provides methods that can be utilized 
to assess biological aging (e.g., computer methods per- 
formed on transcriptomic data of subject), and then treat 
biological aging (e.g., therapeutic methods performed on 
subject). The invention includes methods, system, appara- 
tus, computer program product, among others, to carry out 
the following. 

In some embodiments, a method of creating a biological 
aging clock for a patient is provided. The method can 
include receiving a transcriptome signature derived from a 
patient tissue or organ, which can be obtained by processing 
a biological sample to determine the transcriptome signa- 
ture, such as biomarkers. Based on the transcriptome sig- 
nature, the method can include providing input vectors to a 
machine learning platform. The machine learning platform 
processes the input vectors in order to generate output that 
includes a predicted or determined biological age of a 
sample, which thereby the biological age of the subject can 
be predicted or determined. In some aspects, the biological 
clock is specific to the tissue or organ, or specific to a 
characteristic of the tissue or organ. In some aspects, the 
method can include repeating one or more of the steps (e.g., 
receiving transcriptomes signature and/or inputting the input 
vectors and/or generating output) for determining or creating 
a second biological aging clock, such as for the same 
subject, cell, organ or tissue, or a different subject, cell, 
organ or tissue. In some aspects, the two biological aging 
clocks are combined to create a synthetic biological aging 
clock that addresses biological aging at the tissue, organ, or 
organism level for the subject or more than one subject. In 
some aspects, the method can include repeating one or more 
of the steps a plurality of times to create a plurality biologi- 
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cal aging clocks, such as for two or more organs in a subject, 
or for two or more subjects. In some aspects, the transcrip- 
tome signature and/or input vectors and/or generated output 
is derived from a non-senescent tissue or organ of the patient 
or another organism. 

In some aspects, the machine learning platform comprises 
one or more deep neural networks. In some aspects, the 
machine learning platform comprises one or generative 
adversarial networks. In some aspects, the machine learning 
platform comprises an adversarial autoencoder architecture. 
In some aspects, the machine learning platform comprises a 
feature importance analysis for ranking genes or gene sets 
by their importance in age prediction. 

In some aspects, a subset of the genes or gene sets are 
selected as targets for anti-aging therapies. This can be based 
on the transcriptome signature and/or input vectors and/or 
generated output. In some aspects, a subset of the genes or 
gene sets are selected as targets for aging rejuvenating 
therapies. 

In some aspects, the transcriptome signatures are based on 
signaling pathway activation signatures. In some aspects, the 
input transcriptome signatures profiles are derived from a 
microarray platform. In some aspects, the input transcrip- 
tome signatures profiles are derived from a RNA sequencing 
platform. In some aspects, the biological clock is specific to 
a tissue or organ, or specific to a characteristic of the tissue 
or organ. 

In some aspects, the method can include comparing a 
predicted biological age of an individual with an actual 
chronological age of the individual. In some aspects, the 
method can include correlating a gene expression level with 
a predicted biological age of the individual. In some aspects, 
the method an include correlating a signaling pathway 
signature with a predicted biological age of the individual. 
In some aspects, the method can include comparing a 
predicted biological age of an individual with an actual 
chronological age of the individual, wherein the comparison 
further comprises a prognosis of the life expectancy. In some 
aspects, the method can include comparing a predicted 
biological age of an individual with an actual chronological 
age of the individual, wherein the comparison further com- 
prises a prognosis of the life expectancy and probability of 
survival of patient during treatment. In some aspects, the 
method can include comparing a predicted biological age of 
an individual with an actual chronological age of the indi- 
vidual, wherein the comparison comprises an outcome mea- 
sure of the efficacy of the therapies. 

In some embodiments, a method can include developing 
a drug therapy based on the output. In some aspects, a 
method can include developing a senolytic therapy based on 
the generated output. In some aspects, a method can include 
developing a senoremdiation therapy based on the generated 
output. 

In part, because the method includes one or more bio- 
markers of aging, it could be used to track the efficacy of the 
anti-aging therapies, such as senolytic therapy and senorem- 
diation therapies. The method can predicted the survival or 
life expectancy. Anti-aging drugs should increase life expec- 
tancy, and the methods can be used to track whether the 
administered drugs are increasing life expectancy (e.g. 
decreasing predicted age//make people younger, etc.). 

In some aspects, a method can include developing a 
actuarial risk assessment of mortality, survival or morbidity 
based of an individual based on the generated output. In 
some aspects, a method can include developing an insurance 
assessment using mortality and survival analysis, existing 
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health conditions and whether the applicant smoke based of 
an individual based on the generated output. 

The invention also includes methods for creating a bio- 
logical aging clock for a patient, the method comprising: (a) 
receiving a first transcriptome signature derived from a 
patient tissue or organ; (b) receiving a second first transcrip- 
tome signature derived from a baseline; and (c) computing 
a difference between predicted ages for the signature of (a) 
and the signature of (b). 

In some aspects, the method can provide input vectors to 
a machine learning platform, wherein the machine learning 
platform outputs classification vectors that comprise com- 
ponents of a biological aging clock. 

In some embodiments, a computer program product is 
provided on a tangible non-transitory computer readable 
medium that has a computer readable program code embod- 
ied therein, the program code being executable by a proces- 
sor of a computer or computing system to perform a method 
for generating or determining a biological aging clock for a 
patient. Such a method can include receiving a transcrip- 
tome signature derived from a patient tissue or organ (Step 
(a)). The method can include creating input vectors based on 
the transcriptome signature. The method can include pro- 
viding input vectors to a machine learning platform (Step 
(b)). The method can include the machine learning platform 
generating output that includes a predicted biological age of 
a sample from the patient tissue or organ (Step (c)). In some 
aspects, the biological aging clock is specific to the tissue or 
organ, or specific to a characteristic of the tissue or organ. In 
some aspects, the machine learning platform includes the 
examples and embodiments thereof described herein or 
known in the art. The biological aging clock can be consid- 
ered a method that can be operated to predict the biological 
age of a tissue, organ, or subject, and then compare the 
predicted biological age with the actual age of the subject. 

In some embodiments, the method performed by the 
computer program product can include repeating any Steps 
(a) (b) and (c) to create a second biological aging clock. In 
some aspects, the two or more biological aging clocks are 
combined to create a synthetic biological aging clock that 
addresses biological aging at the tissue, organ, or organism 
level. In some aspects, the method can include repeating 
Steps (a) and (b) a plurality of times to create a plurality 
biological aging clocks. In some aspects, transcriptomic 
signature of Step (a) and/or the profile of Step (b) is derived 
from a non-senescent tissue or organ of the patient or 
another organism. In some aspects, a subset of the genes or 
gene sets are selected as targets for anti-aging therapies. In 
some aspects, a subset of the genes or gene sets are selected 
as targets for aging rejuvenating therapies. In some aspects, 
the transcriptome signatures are based on signaling pathway 
activation signatures. In some aspects, the input transcrip- 
tome signatures profiles are derived from a microarray 
platform. In some aspects, the input transcriptome signatures 
profiles are derived from a RNA sequencing platform. In 
some aspects, the biological clock is specific to a tissue or 
organ, or specific to a characteristic of the tissue or organ. 

The biological aging clocks have been developed using 
different methods/different tissues. In some instances, a 
biological aging clock can be developed using transcrip- 
tomic data extracted from blood profiles combined with 
clocked developed using proteomic data from blood profiles, 
or a clock that was built for the skin tissues and blood. In the 
case of a ‘synthetic’ clock, you have a predicted biological 
age by multiple biological again clocks that combined. 

In some embodiments, the method performed by the 
computer program product can include comparing a pre- 
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dicted biological age of an individual with an actual chrono- 
logical age of the individual. In some aspects, the method 
can include correlating a gene expression level with a 
predicted biological age of the individual. In some aspects, 
the method can include correlating a signaling pathway 
signature with a predicted biological age of the individual. 
In some aspects, the method can include comparing a 
predicted biological age of an individual with an actual 
chronological age of the individual, wherein the comparison 
further comprises a prognosis ofthe life expectancy. In some 
aspects, the method can include comparing a predicted 
biological age of an individual with an actual chronological 
age of the individual, wherein the comparison further com- 
prises a prognosis of the life expectancy and probability of 
survival of patient during treatment. In some aspects, the 
method can include comparing a predicted biological age of 
an individual with an actual chronological age of the indi- 
vidual, wherein the comparison comprises an outcome mea- 
sure of the efficacy of the therapies. 

In some embodiments, the method performed by the 
computer program product can include developing a drug 
therapy based on the output. In some aspects, the method can 
include developing a senolytic therapy based on the output. 
In some aspects, the method can include developing a 
senoremdiation therapy based on the output. In some 
aspects, the method can include developing an actuarial 
assessment of an individual based on the output. In some 
aspects, the method can include developing a risk assess- 
ment based of an individual based on the output. In some 
aspects, the method can include developing an insurance 
assessment based of an individual based on the output. 

In some embodiments, a method of creating a biological 
aging clock for a patient is provided Such a method can 
include: Step (a) receiving a first transcriptome signature 
derived from a patient tissue or organ; Step (b) receiving a 
second first transcriptome signature derived from a baseline; 
and Step (c) computing a difference between the signature of 
(a) and the signature of (b) in order to determine input 
vectors. Step (d) can include inputting the input vectors into 
a machine learning platform. Step (e) can include prediction 
of age using the first transcriptome signature (a) and signa- 
ture of (b) in order to compare estimated age values. In some 
aspects, at least one of the transcriptome signatures is based 
on an in silico signaling pathway activation network decom- 
position, which is a decomposition performed with a 
machine learning platform, such as one described herein or 
otherwise known or created. In some aspects, the biological 
clock is specific to the tissue or organ, or specific to a 
characteristic of the tissue or organ. In some aspects, the 
method can include repeating any one or more of Step (a), 
Step (b), Step (c), Step (d), and/or Step (e) to create a second 
biological aging clock. In some aspects, the two biological 
aging clocks are combined to create a synthetic biological 
aging clock that addresses biological aging at the tissue, 
organ, or organism level. In some aspects, the method can 
include repeating any one or more of Step (a), Step (b), Step 
(c), Step (d), and/or Step (e) a plurality of times to create a 
plurality biological aging clocks. In some aspects, Step (a) 
and/or Step (b) is derived from a non-senescent tissue or 
organ of the patient or another organism, preferably Step (b). 

In some embodiments, a computer program product can 
include a tangible non-transitory computer readable medium 
having a computer readable program code stored therein, the 
program code being executable by a processor of a computer 
or computing system to perform a method for biological 
aging clock for a patient. The method can be a computational 
method as described herein. The computational method can 
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include: (a) receiving data of a first transcriptome signature 
derived from a patient tissue or organ; (b) receiving data of 
a second first transcriptome signature derived from a base- 
line; and (c) computing a difference between the signature of 
Step (a) and the signature of Step (b). Step (c) can include 
computing a difference between the signature of (a) and the 
signature of (b) in order to determine input vectors. Step (d) 
can include inputting the input vectors into a machine 
learning platform. Step (e) can include causing the machine 
learning platform to generate output classification vectors 
that include components of a biological aging clock. In some 
aspects, at least one of the transcriptome signatures is based 
on an in silico signaling pathway activation network decom- 
position, which is a decomposition performed with a 
machine learning platform, such as one described herein or 
otherwise known or created. The computational method can 
include any other computing steps described herein. The 
biological clock can be specific to the tissue or organ, or 
specific to a characteristic of the tissue or organ. 

In some aspects, the computational method can include 
repeating any one or more of Step (a), Step (b), Step (c), Step 
(d), and/or Step (e) to create a second biological aging clock. 
In some aspects, the two biological aging clocks are com- 
bined to create a synthetic biological aging clock that 
addresses biological aging at the tissue, organ, or organism 
level. In some aspects, the computational method can 
include repeating any one or more of Step (a), Step (b), Step 
(c), Step (d), and/or Step (e) a plurality of times to create a 
plurality biological aging clocks. In some aspects, Step (a) 
and/or Step (b) is derived from a non-senescent tissue or 
organ of the patient or another organism, preferably Step (b). 

The present invention also relates to a multi-stage thera- 
peutic for treating senescence (aging) of whole organisms 
(in particular, human individuals), as well as the organism's 
underlying cellular, tissue, and organ senescence. The pres- 
ent invention also relates to evaluation of efficacy of such 
therapeutic. Methods and systems for applying such thera- 
peutic treatment, as well as informatics and other tools for 
developing the therapeutic treatments, are disclosed. Since 
disease and senescence are often associated, the invention is 
also applicable to treating disease. The therapeutic can be 
determined based on the biological clock that is determined 
in the methods described herein. The method for biological 
aging clock for a patient can also include using the output 
thereof, to determine a therapeutic. 

The therapeutic can be the 5R strategy described herein. 

The present disclosure provides compositions and meth- 
ods for a 5R (Rescue, Remove, Replenish, Reinforce, 
Repeat) strategy for selectively rescuing pre-senescent cells, 
removing senescent cells, replenishing and reinforcing by 
new healthy cells and repeating the procedure wherein the 
composition comprises a group of senolytics and their 
derivatives thereof. The strategy of 5R may delay aging 
and/or treat age-related disorders especially fibrotic and 
senofibrotic disorders primarily in lungs and liver. 

This 5R method may delay aging and/or treat age-related 
disorders especially fibrotic and senofibrotic disorders pri- 
marily in lungs, liver and skin. The 5R strategy as described 
is applied to patients with pre-senescent, senescent, and 
fibrotic conditions, among others. Drugs to be used include 
senoremediators, antifibrotic agents, and senolytics. The 5R 
approach will result in induction of regeneration. Drug 
repurposing strategy can be part of the therapy development 
process once the therapy protocols have been designed. 

FIG. 1 shows an embodiment of an age predicting strat- 
egy, which is applied to patients with pre-senescent, senes- 
cent or age-related disease conditions. The following steps 
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can be performed in any method described herein: 1. Single 
biopsy procedure; 2. Sample preparation and Microarray, 
RNA-seq profiles extraction; 3. Gene and gene sets anno- 
tations and expression values extraction; 4. Aging clock 
analysis; 5. Age prediction; 6. Repeat single biopsy proce- 
dure of tissues of individuals after a course of aging therapy; 
7. Sample preparation Microarray, RNA-seq profiles extrac- 
tion; 8. Gene and gene sets annotations and expression 
values extraction; 9. Repeat aging clock analysis; 10. Age 
prediction; and 11. Comparison of predicted age values 
before and after treatment. Any one of these steps may be 
performed alone or in combination of other steps as recited 
herein. In some instances, the methods can include obtaining 
data and processing the data to obtain a recommendation for 
a treatment protocol. The recommended treatment protocol 
can then be implemented on the patient in accordance with 
parameters of the treatment protocol. That is, without the 
computational generation of the treatment protocol, the 
aspects of the treatment protocol cannot be performed 
without the instructions to do so. As such, obtaining the 
instructions, such as the type of drug and/or natural product 
or specific drug and/or natural product or combination of 
drugs and/or natural product, can be vital for performing the 
treatment protocol. 

In some instances, the treatment protocol can be obtained 
by steps 1, 2, 3, 4, and/or 5. Some of these steps may be 
omitted, such as steps 1, 2 when the sample is obtained 
already prepared. In some instances the data from 2 may be 
obtained and provided into a computing system for step 3 
and/or 4. 

In some instances, there is a step 3a, wherein a determined 
treatment protocol is provided by step 3 and/or step 4, 
respectively. The determined treatment protocol can include 
a list of one or more drugs and natural product or treatment 
actions for each treatment step subsequent to steps 3 and/or 
4. 

The invention includes developing a personalized drug 
treatment. 

The FIG. 2 illustrates the strategy of age prediction in case 
of personalized drug and/or natural product treatment, The 
following steps can be performed in any method described 
herein: 1. Single biopsy procedure; 2. Sample preparation 
and Microarray, RNA-seq profiles extraction; 3. Gene and 
gene sets annotations and expression values extraction; 4. 
Aging clock analysis; 5. Age prediction; 6. IPANDA analy- 
sis; 7. for personalized treatment protocol prediction; 8. 
Repeat single biopsy procedure of tissues of individuals 
after a course of aging therapy; 9. Sample preparation 
Microarray, RNA-seq profiles extraction; 10. Gene and gene 
sets annotations and expression values extraction; 9. Repeat 
aging clock analysis; 11. Age prediction; 12. Comparison of 
predicted age values before and after treatment. 

The method of personalized treatment protocol prediction 
may include: (a) receiving a first transcriptome signature 
derived from a patient tissue or organ; (b) receiving a second 
first transcriptome signature derived from a baseline; (c) 
creating a difference matrix, such as in a computer with a 
model or neural network or machine learning, using the 
profile of (a) and the profile of (b); (d) receiving a cellular 
signature library; (e) receiving a drug therapeutic use 
library; (f) using the matrix of (c), the library of (d), and the 
library of (e) to provide input vectors to a machine learning 
platform, wherein the machine learning platform outputs 
classification vectors on one or more drugs, wherein the 
personalized drug treatment is comprised of the classifica- 
tion vectors. 
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The transcriptome signature may be based on a signature 
signaling pathway activation network analysis on a com- 
puter. One of the transcriptome signatures is based an in 
silico signaling pathway activation network decomposition. 
One of the profiles may comprise a Pearson correlation 
matrix. The personalized drug treatment may comprise a 
senescence treatment for the patient. The profile of (b)—the 
second first transcriptome signature derived from a base- 
line—may be derived from a non-senescent tissue or organ 
of the patient or another subject. The method may include 
the machine learning platform comprising one or more deep 
neural networks. The method may include the machine 
learning platform comprising at least two generative adver- 
sarial networks and may comprise an adversarial autoen- 
coder architecture. The personalized drug treatment may be 
created by prescribing drugs identified by the classification 
vectors at their lowest effective dose. 

The invention includes a method of computationally, with 
a computer, designing a treatment protocol for a patient 
comprising one or more drugs, the method comprising: (a) 
identifying a gene expression signature of the patient; (b) 
defining a patient score for signatures taken from one or 
more patient tissues or organs; (c) selecting drugs based 
upon (a) and/or (b); and (d) defining a lowest effective 
combination for each drug. The method may include the 
gene expression signature being based on a signature sig- 
naling pathway activation network analysis, wherein gene 
expression signatures is based on an in silico signaling 
pathway activation network decomposition, wherein the 
gene expression signature comprises a transcriptome Pear- 
son correlation matrix. The method can then include one or 
more treatment steps with one or more treatment drugs or 
treatment steps of any of the treatment methods described 
herein. 

The protocol may be a senescence treatment for the 
patient. The method may include wherein: the signature of 
(a)—gene expression signature of the patient is derived, 
using a computer with appropriate algorithms or models 
(e.g., neural network) from a non-senescent tissue or organ 
of the patient or another subject, wherein (b) and (c) are 
carried out on a machine learning platform, wherein the 
machine learning platform comprises at least two generative 
adversarial networks, wherein the machine learning plat- 
form comprises an adversarial autoencoder architecture, 
wherein the machine learning platform comprises one or 
more deep neural networks. 

In some embodiments, a computer program product can 
include a non-transitory computer readable medium having 
a computer readable program code embodied therein, the 
product being executable by a processor to perform a 
method for estimating the fractional gluconeogenesis of a 
patient, the method comprising developing a personalized 
drug treatment, comprising: (a) receiving a first transcrip- 
tome signature derived from a patient tissue or organ; (b) 
receiving a second first transcriptome signature derived from 
a baseline; (c) creating a difference matrix using the profile 
of (a) and the profile of (b); (d) receiving a cellular signature 
library; (e) receiving a drug therapeutic use library; (f) using 
the matrix of (c), the library of (d) and/or (e), to provide 
input vectors to a machine learning platform, wherein the 
machine learning platform outputs classification vectors on 
one or more drugs, wherein the personalized drug treatment 
is comprised of the classification vectors. 

A transcriptome signature representing tissue or organ 
senescence may be used to develop the biological aging 
clock, and then used to develop or identify at least one of the 
drugs used in the therapeutics described herein. The tran- 
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scriptome signature may be a signaling pathway activation 
network analysis, which is performed on a computer with 
models as described herein. The transcriptome signature 
may be used in the following manner: as a signaling pathway 
activation network analysis, the transcriptome signature is 
used as input to a machine learning platform that outputs 
drug classifications. The transcriptome signature is com- 
pared to a baseline transcriptome signature that represents a 
less senescent version of the patient's tissue or organ, and 
the transcriptome signature is compared to a baseline tran- 
scriptome signature that is constructed from more than one 
tissue or organ transcriptome signature. 

The computer processing can include input and or pro- 
cessing of a complete or partial schematic overview of the 
biochemistry of senescence. Additional information can be 
obtained in the incorporated provisional application regard- 
ing the biological pathways that can be uses as input and 
processing for determining a treatment, such as specific 
drugs for the treatment. Accordingly, the biological path- 
ways can be used in the methods described herein. Such 
biological pathways are described herein with some 
examples of computer processing thereof for implanting the 
design of treatment protocols as recited herein. 

A variety of cell-intrinsic and -extrinsic stresses that can 
activate the cellular senescence program can be used as 
input for a simulation or other computer processing. The 
biological pathways that are known, such as in the literature, 
can be analyzed for specific biological steps that are per- 
formed. Modulation of the biological step either to increase 
the activity or decrease the activity results in a cascading 
series of events in response to the modulated activity. The 
modulations can be with drugs, substances, of other affir- 
mative actions that effect a modulation of the biological 
pathway. This modulation can be measured for a defined 
biological step. The biological step and the change in 
response to the modulation activity can be used as inputs 
into computer models, and such computer models can be 
trained on the data. Now, with the increase in artificial 
intelligence and deep learning algorithms, such biological 
steps, the modulation activity, and the changed response can 
be used with such computer models for modeling biological 
pathways. This can allow for determining a modulation 
activity for one or more biological steps. Such modulations 
activities can be real and based on the simulations, such as 
being a real drug, substance, or medical action. The output 
of the computer models can be instructions or other infor- 
mation for causing the modulation activity in order to obtain 
a specific type of biological step modulation so that the end 
goal of a specifically modulated biological pathway can be 
obtained. Accordingly, the biological pathways described 
herein, or in the incorporated references and provisional 
applications, can be used as the biological pathways for the 
treatment protocols described herein. 

In a specific example, the biological pathways can relate 
to senescence, and the modulation thereof. 

The biological pathways related to senescence can be 
used for computer models. Stressors are known to cause 
biological pathway modulation that results in senescence. 
For example, some stressors engage various cellular signal- 
ing cascades and can ultimately activate p53, p16Ink4a, or 
both. Some stress types that activate p53 through DDR 
signaling can be analyzed and computed. This can include 
computationally processing the ROS to elicit the DDR by 
perturbing gene transcription and DNA replication, as well 
as by shortening telomeres. The computer can also compute 
biological pathways of activated p53 that induces p21, 
which induces a temporal cell-cycle arrest by inhibiting 
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cyclin E-Cdk2, which can be processed. The computer can 
also analyze how p16Ink4a also inhibits cell-cycle progres- 
sion by targeting cyclin D-Cdk4 and cyclin D-Cdk6 com- 
plexes. Both p21 and p16Ink4a act by preventing the inac- 
tivation of Rb, thus resulting in continued repression of E2F 
target genes required for S-phase onset. Upon severe stress 
as modeled and computationally processed, temporally 
arrested cells that transition into a senescent growth arrest 
through a mechanism that is currently incompletely under- 
stood can be determined. Cells exposed to mild damage that 
can be successfully repaired may resume normal cell-cycle 
progression. On the other hand, cells exposed to moderate 
stress that is chronic in nature or that leaves permanent 
damage may resume proliferation through reliance on stress 
support pathways, and such information may be included in 
the data processing. This phenomenon (termed assisted 
cycling) is enabled by p53-mediated activation of p21, 
which can be taken into account when computationally 
determine a treatment, such as a drug treatment. Thus, the 
p53-p21 pathway can either antagonize or synergize with 
p16Ink4a in senescence depending on the type and level of 
stress that is used in the computational processing. BRAF 
(V600E) is unusual in that it establishes senescence through 
a metabolic effector pathway. BRAF(V600E) activates PDH 
by inducing PDP2 and inhibiting PDK1 expression, promot- 
ing a shift from glycolysis to oxidative phosphorylation that 
creates senescence-inducing redox stress, which can be 
taken into account in the computational processing. Cells 
undergoing senescence induce an inflammatory transcrip- 
tome regardless of the senescence inducing stress, and such 
inflammatory transcriptome can be considered in determin- 
ing the treatment. Also, senescence-promoting and senes- 
cence-preventing activities may be computed, and may be 
weighted relative to their importance. A senescence-revers- 
ing mechanism may be input or modeled or otherwise 
computed as part of the process. 

A multi-step senescence model can also be input and 
computed. The model can be programmed to consider 
cellular senescence as a dynamic process driven by epigen- 
etic and genetic changes. An initial step computes the 
progression from a transient to a stable cell-cycle arrest 
through analysis of a sustained activation of the p16Ink4a 
and/or p53-p21 pathways. The model can consider the 
resulting early senescent cells progress to full senescence by 
downregulating lamin B1, thereby triggering extensive chro- 
matin remodeling underlying the production of a SASP. The 
model can consider certain components of the SASP that are 
highly conserved, whereas others may vary depending on 
cell type, nature of the senescence-inducing stressor, or 
cell-to-cell variability in chromatin remodeling. The com- 
putation can consider progression to deep or late senescence 
that may be driven by additional genetic and epigenetic 
changes, which can be computed, including chromatin bud- 
ding, histone proteolysis and retrotransposition, driving fur- 
ther transcriptional change and SASP heterogeneity. The 
computation can consider the efficiency with which immune 
cells dispose of senescent cells, and which may be depen- 
dent on the composition of the SASP. The proinflammatory 
signature of the SASP can fade due to expression of par- 
ticular microRNAs late into the senescence program, 
thereby perhaps allowing evasion of immuno-clearance, 
which can also be considered. 

In some embodiments, a conceptual model can be com- 
puted in which senescent cells are subdivided into two main 
classes based on kinetics of senescence induction and func- 
tionality. The conceptual model can consider that acute 
senescence is induced through cell-extrinsic stimuli that 
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target a specific population of cells in the tissue. Acute 
senescent cells self-organize their elimination through SASP 
components that attract various types of immune cells. The 
conceptual model can be programmed to consider that 
induction of chronic senescence occurs after periods of 
progressive cellular stress or macromolecular damage when 
tarry cycling transitions into a stable cell-cycle arrest. The 
conceptual model can consider that age-related immunode- 
ficiency or production of less proinflammatory SASPs, 
immune cells may inefliciently eliminate chronic senescent 
cells, allowing continuation of multi-step senescence. For 
example, the conceptual model may consider that senes- 
cence induced during cancer therapy may initially be acute 
and later chronic in nature. 

The computer models can be programed and receive 
senescence input data for computing how senescence pro- 
motes age-related tissue dysfunction. Senescence contrib- 
utes to the overall decline in tissue regenerative potential 
that occurs with ageing. The computer models can be 
programed with the observation that progenitor cell popu- 
lations in both skeletal muscle and fat tissue of BubR1 
progeroid mice are highly prone to cellular senescence. 
Proteases chronically secreted by senescent cells may per- 
turb tissue structure and organization by cleaving mem- 
brane-bound receptors, signaling ligands, extracellular 
matrix proteins or other components in the tissue microen- 
vironment, which can affect the treatment protocols 
described herein. In addition, other SASP components, 
including IL-6 and IL-8, may stimulate tissue fibrosis in 
certain epithelial tissues by inducing EMT may be consid- 
ered. Chronic tissue inflammation, which is characterized by 
infiltration of macrophages and lymphocytes, fibrosis and 
cell death, is associated with ageing and has a causal role in 
the development of various age-related diseases, which can 
be considered during identifying a treatment. 

The matrix metalloproteinases and proinflammatory 
SASP components can be modeled and considered in deter- 
mining a treatment because of their ability create a tissue 
microenvironment that promotes survival, proliferation and 
dissemination of neoplastic cells. The model can be pro- 
cessed so that SASP can be modeled for increasing age- 
related tissue deterioration through paracrine senescence, 
where senescent cells spread the senescence phenotype to 
healthy neighboring cells through secretion of IL-1b, TGFb 
and certain chemokine ligands. With gene expression analy- 
sis or pathway analysis it is possible to distinguish between 
pre-senescent and senescent cells signatures with the com- 
putations. 

The models can be computed to consider that killing 
senescent cells can lead to rejuvenation of the tissue. For 
example, a modified FOXO4-p53 interfering peptide can be 
considered that causes p53 and induces targeted apoptosis of 
senescent cells (TASC), which neutralizes murine liver 
chemotoxicity from doxorubicin treatment. The TASC can 
be considered for restoring fitness, hair density, and renal 
function in fast and naturally aged mice. 

'The model can be processed so that delaying senescence 
or even promote death of accumulating apoptosis-resistant 
senescent cells can be a strategy to prevent age related 
diseases. Tocotrienols (T3s) and quercetin (Q) can be input 
for modeling as senolytics agents (e.g., small molecules that 
can selectively induce death of senescent cells). Both drugs 
are able to kill pre-senescent and senescent cells and can be 
used adjuvant therapy of cancer and preventive anti-aging 
strategies, and thereby can be used in the treatments herein. 

The computational models can also consider fibrosis and 
senofibrosis conditions. The term fibrosis describes the 
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development of fibrous connective tissue as a reparative 
response to injury or damage, which can be considered 
during computing for treatment protocols. Fibrosis may 
refer to the connective tissue deposition that occurs as part 
of normal healing or to the excess tissue deposition that 
occurs as a pathological process. The term senofibrosis 
describes the development of fibrous connective tissue under 
influence of senescent cells, which can be considered during 
computing for treatment protocols. Senescent activated cells 
lose their proliferative and collagen-producing capacity and 
have increased inflammatory property to produce inflamma- 
tory cytokines compared with replicating activated “normal” 
cells. The computational models can focus on two types of 
fibrosis and senofibrosis treatment: pulmonary (IPF) and 
liver. 

The models can be processed to consider that fibrosis is a 
wound healing response that produces and deposits extra- 
cellular matrix (ECM) proteins including collagen fibers, 
causing tissue scarring. Liver usually regenerates after liver 
injury. However, when liver injury and inflammation are 
persistent and progressive, liver cannot regenerate normally 
and causes fibrosis. Hepatic stellate cells (HSCs) are the 
primary source of activated myofibroblasts that produce 
extracellular matrix in the liver. Progressive liver fibrosis 
results in cirrhosis where liver cells cannot function properly 
due to the formation of fibrous scar and regenerative nodules 
and the decreased blood supply to the liver. The model can 
perform such simulations. The model can consider three 
main reasons for liver fibrosis: alcoholic fatty diseases; 
non-alcoholic fatty diseases; and viral hepatitis. In each case 
different mechanisms lead to fibrotic tissue formation, which 
mechanisms can be processed to determine a suitable pro- 
tocol. 

The model can also consider that quiescent HSCs store 
Vitamin A-containing lipid droplets, and HSCs lose lipid 
droplets when they are activated. Transforming growth 
factor (TGF)-B and platelet-derived growth factor (PDGF) 
are two major cytokines that contribute to HSC activation 
and proliferation, resulting in activation into myofibroblasts. 
Many other cytokines, intracellular signaling, and transcrip- 
tion factors are involved in this process, and may be con- 
sidered during computations. 

The computational models can also consider activation 
and regression of hepatic stellate cells. Quiescent hepatic 
stellate cells (HSCs) store Vitamin A containing lipid drop- 
lets and lose Vitamin A when the cells are activated. Hepatic 
epithelial injury, such as death of hepatocytes and biliary 
epithelial cells, induces activation of HSCs directly or 
through cytokines released from immune cells including 
Kupffer cells, bone marrow-derived monocytes, Th17 cells, 
and innate lymphoid cells (ILC). Transforming growth fac- 
tor-f (TGF-f), platelet-derived growth factor (PDGF), inter- 
leukin-1f (IL-1f), IL-17, and intestine-derived lipopolysac- 
charide (LPS) promote HSC activation. IL-33 promotes 
HSC activation through ILC2. Autophagy in HSCs is asso- 
ciated with HSC activation. The activated myofibroblast 
pool is mainly constituted by activated HSCs, but biliary 
injury induces differentiation of portal fibroblasts to acti- 
vated myofibroblasts. However, there is no evidence of 
epithelial-mesenchymal transition for constituting the myo- 
fibroblast pool. After the cessation of causative liver injury, 
fibrosis starts regression, and activated HSCs induce apop- 
tosis or revert into a quiescent state. Peroxisome prolifera- 
tor-activated receptor 7 (PPAR7) expression in HSCs is 
associated with HSC reversal. Some activated HSCs become 
senescent, resulting in loss of profibrogenic property in 
which p53 plays a role. Moreover, angiogenesis contributes 
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to both fibrosis development and regression. As such, each 
may be considered when computing a therapeutic protocol. 

The main pathways that are involved in modulation of 
hepatic inflammation can be categorized as (1) Upregulated 
and (2) Downregulated. The main pathways that are 
involved in formation of cellular senescence in HSCs can be 
categorized as (1) Upregulated and (2) Downregulated. Both 
upregulation and downregulation of any biological pathway, 
such as those described herein, may be considered during the 
computation of therapeutic protocols. 

The main pathways which are involved in formation of 
cellular senescence phenotype in primary human hepato- 
cytes (PHH). Data for the analysis is taken from LINCs 
transcriptomic dataset and computed as described herein. 
Methanesulfonate is a DNA damage/senescence inducer, 
which may be used in obtaining data to train the models. 
Liver senescence and liver fibrosis signatures hold the 
common features on the pathway level (analysis is based on 
the gene expression data using iPANDA, as described fur- 
ther below. 

The main pathways which are involved in formation of 
cellular senescence phenotype in primary human hepato- 
cytes (PHH). Data for the analysis, and model computations 
for determining a therapeutic protocol can be taken from 
LINCs transcriptomic dataset. The following are Up-regu- 
lated: BRCA1 Pathway Homologous Recombination 
Repair; JNK Pathway Insulin Signaling; Caspase Cascade 
Pathway Activated Tissue Trans-glutaminase; JNK Pathway 
Gene Expression Apoptosis Inflammation Tumorigenesis 
Cell Migration via SMAD4, STAT4, HSF1, TP53, MAP2, 
DCX, ATF2, NFATC3, SPIRE], MAPIB, TCF15, ELKI, 
BCL2, JUN, PXN, and NFATC2; Caspase Cascade Pathway 
DNA Fragmentation, TRAF Pathway Gene Expression via 
FOS and JUN; IF1 Alpha Pathway Gene Expression via JUN 
and CREB3; TNF Signaling Pathway Apoptosis; PTEN 
Pathway Genomic Stability; VEGF Pathway Gene Expres- 
sion and Cell Proliferation via MAPK7; ErbB Family Path- 
way Gene Expression via JUN, FOS, and ELK1; PTEN 
Pathway Ca2+ Signaling; PTEN Pathway DNA Repair; 
VEGF Pathway Prostaglandin Production; MAPK Family 
Pathway Gene Expression via ATF2, JUN, ELK1, NFKB2, 
and CREB3; HIF1Alpha Pathway; WNT Pathway; ATM 
Pathway Cell Survival; and MAPK Family Pathway Trans- 
lation. The following are Down-regulated: Ras Pathway 
Increased T-cell Adhesion; HGF Pathway Cell Adhesion and 
Cell Migration; IGFIR Signaling Pathway Cell Migration; 
ILK Signaling Pathway Cell Migration Retraction; ILK 
Signaling Pathway Cell Cycle Proliferation; ILK Signaling 
Pathway G2 Phase Arrest; ILK Signaling Pathway Cyto- 
skeletal Adhesion Complexes; ILK Signaling Pathway Loss 
of Occludin Barrier Dysfunction; ATM Pathway Cell Cycle 
Checkpoint Control; Akt Signaling Pathway AR mediated 
apoptosis; Akt Signaling Pathway Apoptosis; Akt Signaling 
Pathway Cell Cycle Progression; and Akt Signaling Path- 
way Elevation of Glucose Import. The role of senescence of 
HSCs in liver fibrosis may be computed, and experimental 
data using cell-specific genetic modifications to HSCs from 
experimental models of liver fibrosis in vivo can be used in 
the computation of treatment protocols. 

There is no treatment for liver fibrosis still. The only way 
to avoid it is to prevent massive inflammation by rescuing or 
killing pre-senescent and senescent cells accordingly. Liver 
senescence and liver fibrosis signatures hold the common 
features on the pathway level (analysis is based on the gene 
expression data using iPANDA package). The common 
significant pathways involved into modulation liver fibrosis 
(and cirrhosis) are that can be considered in the computation 
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models include the following upregulated and down regu- 
lated pathways. Those upregulated include: ILK Signaling 
Pathway Opsonization; ILK Signaling Pathway Cell Adhe- 
sion; ILK Signaling Pathway Wound Healing; Akt Signaling 
Pathway AR mediated apoptosis; TRAF Pathway; IL-10 
Pathway Stability Determination; EGF Pathway Rab5 Regu- 
lation Pathway; TRAF Pathway Gene Expression via FOS 
and JUN; ILK Signaling Pathway Tumor Angiogenesis; Akt 
Signaling Pathway NF-kB dependent transcription; 
HIF1Alpha Pathway Gene Expression via JUN and CREB3; 
Chemokine Pathway; STAT3 Pathway Growth Arrest and 
Differentiation; TRAF Pathway Apoptosis; Erythropoietin 
Pathway GPI Hidrolysis and Ca2+ influx; IL-10 Pathway; 
IL-10 Pathway Inflammatory Cytokine Genes Expression 
via STAT3; ILK Signaling Pathway MMP2 MMP9 Gene 
Expression Tissue Invasion via FOS; ErbB Family Pathway 
Gene Expression via JUN, FOS, and ELK1; Akt Signaling 
Pathway Regulation of Na+ Transport; PAK Pathway Pax- 
illin Disassembly; ILK Signaling Pathway Cytoskeletal 
Adhesion Complexes; cAMP Pathway Glycogen Synthesis; 
and ILK Signaling Pathway Cell Migration Retraction. 
Those downregulated include: STAT3 Pathway Anti-Apop- 
tosis; Akt Signaling Pathway Cell Cycle Progression; Cir- 
cadian Pathway; Growth Hormone Signaling Pathway Pro- 
tein Synthesis; and PTEN Pathway Migration. 

The common significant pathways involved in formation 
of cellular senescence and liver fibrosis that can be com- 
puted include those that are upregulated and downregulated. 
Those upregulated include: ErbB Family Pathway Gene 
Expression via JUN, FOS, and ELK1; HIF1Alpha Pathway 
Gene Expression via JUN and CREB3; and TRAF Pathway 
Gene Expression via FOS and JUN. Those downregulated 
include Akt Signaling Pathway Cell Cycle Progression. The 
common significant pathways involved into modulation of 
IPF include those upregulated or downregulated. Those 
upregulated include: Cellular Apoptosis Pathway; KEGG 
Choline metabolism in cancer Main Pathway; KEGG Pros- 
tate cancer Main Pathway; NCI CXCR4 mediated signaling 
events Main Pathway; NCI Syndecan 4 mediated signaling 
events Main Pathway; NCI TRAIL signaling Main Pathway; 
NCI Validated transcriptional targets of deltaNp63 isoforms 
Main Pathway; NCI Validated transcriptional targets of 
deltaNp63 isoforms Pathway (Pathway degradation of 
TP63); PTEN Pathway Adhesion or Migration; PTEN Path- 
way Angiogenesis and Tumorigenesis; PTEN Pathway Ca2+ 
Signaling; reactome Collagen biosynthesis and modifying 
enzymes Main Pathway; and reactome SMAD2, SMAD3, 
and SMAD4, heterotrimer regulates transcription Main 
Pathway. Those downregulated include: Growth Hormone 
Signaling Pathway Gene Expression via SRE, ELKI, 
STATSB, CEBPD, STATI, STAT3; and reactome Tie2 Sig- 
naling Main Pathway. 

The common significant pathways involved in formation 
of cellular senescence in lung tissue can include those 
upregulated and downregulated. Those upregulated include: 
Growth Hormone Signaling Pathway Gene Expression via 
SRF, ELK1, STATSB, CEBPD, STATI, STAT3; KEGG 
Choline metabolism in cancer Main Pathway; KEGG Pros- 
tate cancer Main Pathway; NCI CXCR4 mediated signaling 
events Main Pathway; NCI TRAIL signaling Main Pathway; 
PTEN Pathway Adhesion or Migration; PTEN Pathway 
Angiogenesis and Tumorigenesis; PTEN Pathway Ca2+ 
Signaling; reactome Collagen biosynthesis and modifying 
enzymes Main Pathway; reactome SMAD2, SMAD3, 
SMAD4 heterotrimer regulates transcription Main Pathway; 
and reactome Tie2 Signaling Main Pathway. Those down- 
regulated include: Cellular Apoptosis Pathway; NCI Synde- 
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can 4 mediated signaling events Main Pathway; NCI Vali- 
dated transcriptional targets of deltaNp63 isoforms Main; 
Pathway; NCI Validated transcriptional targets of deltaNp63 
isoforms Pathway (Pathway degradation of TP63). 

Cellular senescence can contribute to accelerating organ 
aging, and, among the pulmonary diseases that can be 
related to pulmonary senescence, chronic obstructive pul- 
monary disease/emphysema (COPD) and idiopathic pulmo- 
nary fibrosis (IPF), are the most common and lethal. COPD 
and IPF are severe multifactorial pulmonary disorders char- 
acterized by distinct clinical and pathologic features 
(“Global Strategy for the Diagnosis, Management, and Pre- 
vention of Chronic Obstructive Pulmonary Disease: GOLD 
Executive Summary Updated 2003" 2004; Noble et al. 
2011). The date regarding clinical and pathological features 
can be used in the computational models that are processed 
for determining the therapeutic protocols. 

In all known types of cellular senescence, including 
replicative cellular senescence, stress-induced senescence, 
and oncogene-induced senescence, a permanent state of cell 
cycle arrest occurs that is mediated by the expression of 
p16INK4a and p21WAFI, 2 cell cycle inhibitors that are 
also well-recognized markers, to investigate this mechanism 
in vivo (Kim and Sharpless 2006; Campisi 2005; Mallette 
and Ferbeyre 2007; Ohtani et al. 2004; Takeuchi et al. 2010). 
Altered expression of p16INK4a, p21WAF1, and b-galac- 
tosidase (a widely used histochemical marker of cellular 
senescence) have been demonstrated in IPF (Minagawa et al. 
2010; Kuwano et al. 1996; Lomas et al. 2012). These 
markers are expressed strongly at sites of alveolar damage 
and hyperplasia, as well as in fibroblast foci localized in the 
discrete clusters of bronchiolar basal cells coexpressing the 
laminin-5-g2 chain (LAM5g2) and heat shock protein 27 
(Hsp27) (Chilosi et al. 2006). According to review (Chilosi 
et al. 2013) several factors lead to senescence in lungs, they 
are different for two types: idiopathic pulmonary fibrosis and 
chronic obstructive pulmonary disease/emphysema patho- 
genesis. This information may also be used in the compu- 
tational models for determining therapeutic protocols. 

Methods for development of senescence drug treatments, 
that 1s, the selection of drugs, dosages, and cycles, are 
described herein. In this section, we give an overview ofthe 
drug treatments, themselves, that is, application of the 
personalized treatments once they have been designed, in a 
preferred embodiment, to the patient. In that patient, a tissue 
or organ is identified to which the senescent treatment will 
be applied. 

In a preferred embodiment, one phase of the treatment 
involves senoremediation, that is, a drug protocol of senore- 
mediators, which are drugs that restore or increase the 
amount of presenescent cells (cells that are typical or a 
young, healthy tissue or organ). Another phase of the treat- 
ment involves senolytic treatment, that is, a drug protocol 
that involves restoring or that involves elimination or 
destruction of senescent cells in the tissue or organ of 
interest. 

In another preferred embodiment, there is also an antifi- 
brotic phase, that is, a drug protocol that addressing fibrotic 
cells in the tissue or organ of interest. Antifibrotic may 
involve restoring senescent cells to a pre-senescent, non- 
fibrotic state, elimination or destruction of fibrotic cells, or 
both. 

Since such drug treatment protocols are highly specific, 
and based upon the classification vectors of the analyses 
described herein, they may take many forms. Methods in the 
art, such as Seim et. al., “Gene expression signatures of 
human cell and tissue longevity", npj Aging and Mecha- 
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nisms of Disease, 2, 16014 (2016), addresses transcriptome 
changes/differences associated with senescence that are used 
to classify drug protocols. 

To examine gene expression strategies that support the 
lifespan of different cell types within the human body, one 
can obtain available RNA-seq data sets and interrogated 
transcriptomes of various somatic cell types and tissues with 
reported cellular turnover, along with an estimate of lifes- 
pan, ranging from 2 days (monocytes) to effectively a 
lifetime (neurons). Across different cell lineages, one can 
obtain a gene expression signature of human cell and tissue 
turnover. In particular, turnover showed a negative correla- 
tion with the energetically costly cell cycle and factors 
supporting genome stability, concomitant risk factors for 
aging-associated pathologies. 

Comparative transcriptome studies of long-lived and 
short-lived mammals, and analyses that examined the lon- 
gevity trait across a large group of mammals (tissue-by- 
tissue surveys, focusing on brain, liver and kidney), have 
revealed candidate longevity-associated processes. Publicly 
available transcriptome data sets (for example, RNA-seq) 
generated by consortia, such as the Human Protein Atlas 
(HPA) can be used. They offer an opportunity to understand 
how gene expression programs are related to cellular turn- 
over, as a proxy for cellular lifespan. Gene expression 
patterns are typically analyzed, in a preferred embodiment, 
using Principal Component Analysis (PCA), as a first step. 

The present invention involves examining aging tran- 
scriptome in which the transcribed genes in old to young 
people are compared to define a set first of genes which are 
more strongly expressed (activated) in old people relative to 
young people and a second set of genes (repressed) which 
are less strongly expressed in old people relative to young 
people. A preferred embodiment is herein described. 

A rating approach can be used to rank the senescence 
treating properties of treatments first involves collecting the 
transcriptome datasets from young and old patients and 
normalizing the data for each cell and tissue type, evaluating 
the pathway activation strength (PAS) for each individual 
pathway and constructing the pathway cloud and screen for 
drugs or combinations that minimize the signaling pathway 
cloud disturbance by acting on one or multiple elements of 
the pathway cloud. Drugs and combinations may be rated by 
their ability to return the signaling pathway activation pat- 
tern closer to that of the younger tissue samples. The 
predictions may be then tested both in vitro and in vivo on 
human cells and on model organisms such as rodents, 
nematodes and flies to validate the screening and rating 
algorithms. 

In a preferred embodiment of the senescence treatment, a 
method for ranking drugs, the method including; a. collect- 
ing young subject transcriptome data and old subject tran- 
scriptome data for one species to evaluate pathway activa- 
tion strength (PAS) and down-regulation strength for a 
plurality of biological pathways; b. mapping the plurality of 
biological pathways for the activation strength and down- 
regulation strength from old subject samples relative to 
young subject samples to form a pathway cloud map; and c. 
providing a rating for each of a plurality of drugs in 
accordance with a drug rating for minimizing signaling 
pathway cloud disturbance (SPCD) in the pathway cloud 
map of the one species to provide a ranking of the drugs. 

Pathway Activation and Pathway Activation Network 
Decomposition Analysis (¡PANDA), is a preferred method 
of network analysis for the methods described herein. 

Development of senescence treatments (in particular drug 
combinations and protocols) as contemplated by the authors, 
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are particularly compatible with the signaling pathway acti- 
vation network analysis as described, for example, in U.S. 
62/401,789 (Ozerov, filed September 2016, now US 2018- 
0125865) and Ozerov et. al., “In silico Pathway Activation 
Network Decomposition Analysis (PANDA) as a method 
for biomarker development", Nature Communications, 7: 
13427, 2016, and both incorporated by specific reference in 
their entity. Such methods include large-scale transcriptomic 
data analysis that involves insilico Pathway Activation Net- 
work Decomposition Analysis (iPANDA). The capabilities 
of this method apply to multiple data sets containing data on 
obtained, for example, from Gene Expression Omnibus 
(GEO). Data sets in GEO are accessed by identifier, or 
accession number, such as GSE5350. 

Additionally, according to an embodiment of the present 
invention, the pathway cloud map shows at least one upregu- 
lated/activated pathway and at least one down-regulated 
pathway of the old subject relative to the young subject. 
Furthermore, according to an embodiment of the present 
invention, the pathway cloud map is based on a plurality of 
young subjects and a plurality of old subjects. Importantly, 
according to an embodiment of the present invention, the 
method is performed for an individual to determine an 
optimized ranking of drugs for the individual. 

Further, according to an embodiment of the present inven- 
tion, the samples or biopsies are bodily samples selected 
from one or more of a blood sample, a urine sample, a 
biopsy, a hair sample, a nail sample, a breathe sample, a 
saliva sample, or a skin sample. 

Yet further, according to an embodiment of the present 
invention, the pathway activation strength is calculated by 
dividing the expression levels for a gene n in the old subject 
samples by the gene expression levels of the young subject 
samples. 

Additionally, according to an embodiment of the present 
invention, the pathway activation strength is calculated in 
accordance with 
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The [RGEL]i is an activator gene expression level and 
[RGEL]j is a repressor gene expression level) are expression 
level of activators gene i and j, respectively. 

Yet further, according to an embodiment of the present 
invention, to drugs or combinations that minimize the sig- 
naling pathway cloud disturbance (SPCD). Additionally, 
according to an embodiment of the present invention, the 
SPCD is a ratio of [AGEL]i, which is the activator gene ži 
expression level, to [RGEL]j, which is the repressor gene £j 
expression level, and wherein this is calculated for activator 
and repressor proteins in the pathway. 

Cellular Network Analysis and iPANDA 

There are well known method in the art (see, for example, 
U.S. Pat. No. 8,623,592) for treating patients with methods 
for predicting responses of cells to treatment with therapeu- 
tic agents. These methods involve measuring, in a sample of 
the cells, levels of one or more components of a cellular 
network and then computing a Network Activation State 
(NAS) or a Network Inhibition State (NIS) for the cells 
using a computational model of the cellular network. The 
response of the cells to treatment is then predicted based on 
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the NAS or NIS value that has been computed. The present 
invention also comprises predictive methods for cellular 
responsiveness in which computation ofaNAS or NIS value 
for the cells (e.g., senescent cells) is combined with use of 
a statistical classification algorithm. A preferred method of 
iPANDA implementation is now described. The method of 
transcriptomic data analysis, typically includes receiving 
cell transcriptomic data of a control group (C) and cell 
transcriptomic data (S) of group under study for a gene, 
calculating a fold change ratio (fc) for the gene, repeating 
steps a and b for a plurality of genes, grouping co-expressed 
genes into modules, estimating gene importance factors 
based on a network topology, mapped from a plurality of the 
modules, in order to obtain an in silico Pathway Activation 
Network Decomposition Analysis (¡PANDA) value, the 
iPANDA value having a Pearson coefficient greater than a 
Pearson coefficient associated with another platform for 
manipulating the control cell transcriptomic data and the cell 
transcriptomic data of group under study for the plurality of 
genes. Steps may also include determining a biological an in 
silico Pathway Activation Network Decomposition Analysis 
(PANDA) associated with at least one of the above the 
module, providing a classifier for treatment response pre- 
diction ofa drug to a disease, wherein the disease is selected 
from a senescence and another disease or disorder, applying 
at least one statistical filtering test and a statistical threshold 
test to the fc values, obtaining proliferative bodily samples 
and healthy bodily samples from patients, applying the drug 
to the patients, determining responder and non-responder 
patients to the drug. The method also often includes com- 
paring gene expression in at least one of selected signaling 
pathways and metabolic pathways, often associated with a 
drug. 

One of the most relevant challenges in transcriptomic data 
analysis is the inherent complexity of gene network inter- 
actions, which remains a significant obstacle in building 
comprehensive predictive models. Moreover, high diversity 
of experimental platforms and inconsistency of the data 
coming from the various types of equipment—may also lead 
to the incorrect interpretation of the underlying biological 
processes. Although a number of data normalization 
approaches have been proposed over the recent years it 
remains difficult to achieve robust results over a group of 
independent data sets even when they are obtained from the 
same profiling platform. This may be explained by a range 
of biological factors, such as wide heterogeneity among 
individuals on the population basis, variance in the cell cycle 
stage of the cells used or a set of technical factors, such as 
sample preparation or batch variations in reagents. 

A preferred embodiment of the present invention is com- 
patible with the large-scale transcriptomic data analysis 
called in silico Pathway Activation Network Decomposition 
Analysis (PANDA) as described herein. iPANDA is an 
effective tool for biologically relevant dimension reduction 
in transcriptomic data. 

Overview of a Preferred iPANDA Embodiment 

Fold changes between the gene expression levels in the 
samples under investigation and an average expression level 
of samples within the normal set is used as input data for the 
iPANDA algorithm. Since some genes may have a stronger 
effect on the pathway activation than others, the gene 
importance factor has been introduced. Several approaches 
of gene importance hierarchy calculation have been pro- 
posed during the last few decades. The vast majority of these 
approaches aim to enrich pathway-based models with spe- 
cific gene markers most relevant for a given study. While 
some of them use detailed kinetic models of several par- 
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ticular metabolic networks to derive importance factors, in 
others, gene importance is derived from the statistical analy- 
sis ofthe gene expression data obtained for disease cases and 
healthy samples. 

The iPANDA approach integrates different analytical con- 
cepts described above into a single network model as it 
simultaneously exploits statistical and topological weights 
for gene importance estimation. The smooth threshold based 
on the P values from a t-test performed on groups of two 
contrasting tissue samples is applied to the gene expression 
values. The smooth threshold is defined as a continuous 
function of P value ranging from 0 to 1. The statistical 
weights for genes are also derived during this procedure. 
The topological weights for genes are obtained during the 
pathway map decomposition. The topological weight of 
each gene is proportional to the number of independent 
paths through the pathway gene network represented as a 
directed graph. 

It is well known that multiple genes exhibit considerable 
correlations in their expression levels. Most algorithms for 
pathway analysis treat gene expression levels as independent 
variables, which, despite the common belief, is not suitable 
when the topology based coefficients are applied. Indeed, 
due to exchangeability, there is no dependence of pathway 
activation values on how the topology weights are distrib- 
uted over a set of coexpressed genes with correlated expres- 
sion levels, and hence correlated fold changes. Thus, the 
computation of topological coefficients for a set of coex- 
pressed genes is inefficient, unless a group of coexpressed 
genes is being considered as a single unit. To circumvent this 
challenge, gene modules reflecting the coexpression of 
genes are introduced in the iPANDA algorithm. The wide 
database of gene coexpression in human samples, COEX- 
PRESdb, and the database of the downstream genes con- 
trolled by various transcriptional factors are utilized for 
grouping genes into modules. In this way, the topological 
coefficients are estimated for each gene module as a whole 
rather than for individual genes inside the module 

The contribution of gene units (including gene modules 
and individual genes) to pathway activation is computed as 
a product of their fold changes in logarithmic scale, topo- 
logical and statistical weights. Then the contributions are 
multiplied by a discrete coefficient which equals to -1 or +1 
in the case of pathway activation or suppression by the 
particular unit, respectively. Finally, the activation scores, 
which we refer to as PANDA values, are obtained as a linear 
combination of the scores calculated for gene units that 
contribute to the pathway activation/suppression. Therefore, 
the iPANDA values represent the signed scores showing the 
intensity and direction of pathway activation. 

Pathway Quality Metrics and iPANDA 

Although currently there are several publicly available 
pipelines for benchmarking the transcriptomic data analysis 
algorithms, our aim is to generalize the approaches for 
pathway-based algorithm testing and reveal the common 
features of reliable pathway-based expression data analysis. 
We term these features “pathway analysis quality hall- 
marks". Efficient methods for pathway-based transcriptomic 
data analysis should be capable to perform a significant 
noise reduction in the input data and aggregate output data 
as a small number of highly informative features (pathway 
markers). 

Scalability (the ability to process pathways with small or 
large numbers of genes similarly) is another critical aspect 
that should be considered when designing a reliable pathway 
analysis approach, since pathway activation values for path- 
ways of different sizes should be equally credible. The list of 


US 10,325,673 B2 


23 


pathway markers identified should be relevant to the specific 
phenotype or medical condition, and robust over multiple 
data sets related to the process or biological state under 
investigation. The calculation time should be reasonable to 
allow high-throughput screening of large transcriptomic 
data sets. To address the iPANDA algorithm in respect to 
these hallmarks and to fully assess its true potential and 
limitations, we have directly compared the results obtained 
by iPANDA using the tissue and Microarray Analysis Qual- 
ity Control (MAQC)-I data sets with five other widely used 
third-party viable alternatives (GSEA8, SPIA9, Pathway 
Level Analysis of Gene Expression (PLAGE) 26, single 
sample Gene Set Enrichment Analysis (ssGSEA) and 
Denoising Algorithm based on Relevant network Topology 
(DART)). 

iPANDA as a Tool for Noise Reduction in Transcriptomic 
Data 

One of the major issues that should be addressed when 
developing a novel transcriptomic data analysis algorithm is 
the ability of the proposed method to reduce noise while 
retaining the biologically relevant information of the results. 
Since pathway-based analysis algorithms are considered 
dimension reduction techniques, the pathway activation 
scores should represent collective variables describing only 
biologically significant changes in the gene expression pro- 
file. 

In order to estimate the ability of the iPANDA algorithm 
to perform noise reduction while preserving biologically 
relevant features, we performed an analysis of the well- 
known MAQC data set (GEO identifier GSE5350). It con- 
tains data for the same cell samples processed using various 
transcriptome profiling platforms. A satisfactory pathway or 
network analysis algorithm should reduce the noise level 
and demonstrate a higher degree of similarity between the 
samples in comparison to the similarity calculated using 
gene set data. 

To estimate gene level similarity only fold changes for 
differentially expressed genes (t-test P value<0.05) were 
utilized. Pearson correlation is chosen as a metric to measure 
the similarity between samples. Sample-wise correlation 
coefficients were obtained for the same samples profiled on 
Affymetrix and Agilent platforms. Similar procedure is 
performed using pathway activation values (¡PANDA val- 
ues). 

Notably, the similarity calculated using pathway activa- 
tion values generated by the iPANDA algorithm signifi- 
cantly exceeds the one calculated using fold changes for the 
differentially expressed genes (mean sample-wise correla- 
tion is over 0.88 and 0.79, respectively). To further validate 
our algorithm, we directly compared its noise reduction 
efficacy with that of other routinely used methods for 
transcriptome-based pathway analysis, such as SPIA, 
GSEA, ssGSEA, PLAGE and DART. 

The mean sample-wise correlation between platforms is 
0.88 for iPANDA compared with 0.53 for GSEA, 0.84 for 
SPIA, 0.69 for ssGSEA, 0.67 for PLAGE and 0.41 for 
DART. Furthermore, the sample-wise correlation distribu- 
tion obtained using iPANDA values is narrowed to a range 
of 0.79 to 0.94, compared with -0.08-0.80, 0.60-0.92, 0.61- 
0.74, 0.45-0.75 and -0.11-0.60 for GSEA, SPIA, ssGSEA, 
PLAGE and DART, respectively. 

In a preferred embodiment, iPANDA does generally 
assign more weights to genes that tend to be reliably 
coexpressed using information from COEXPRESSdb data- 
base. The information from COEXPRESSdb 1s utilized 
solely for grouping genes into modules, and hence cannot 
introduce any favorable bias towards iPANDA in this assess- 
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ment. Even when the feature for grouping genes into mod- 
ules is ‘switched off’, meaning that all genes are considered 
individually and no information from COEXPRESSdb is 
being utilized, iPANDA scores show higher sample-wise 
similarity between data obtained using various profiling 
platforms compared with the similarity calculated on the 
gene level. 

Biomarker Identification and Relevance and iPANDA 

As a next step we address the iPANDA ability to identify 
potential biomarkers (or pathway markers) of the phenotype 
under investigation. One of the commonly used methods to 
assess the capability of transcriptomic pathway markers to 
distinguish between two groups of samples (for example, 
resistance and sensitivity to treatment) is to measure their 
receiver operating characteristics area under curve (AUC) 
values. The capacity to generate a high number of biomark- 
ers with high AUC values is a major requirement for any 
prospective transcriptomic data analysis algorithm to be 
used in prediction models. 

iPANDA Produces Highly Robust Set of Biomarkers 

One of the most important shortcomings of modern 
pathway analysis approaches is their inability to produce 
consistent results for different data sets obtained indepen- 
dently for the same biological case. Here we show that 
iPANDA algorithm applied to the tissue data overcomes this 
flaw and produces highly consistent set of pathway markers 
across the data sets used in the study. The iPANDA algo- 
rithm is an advantageous method for biologically relevant 
pathway marker development compared with the other path- 
way-based approaches. 

The common marker pathway (CMP) index is applied to 
drug treatment response data for in order to estimate the 
robustness of the biomarker lists. Pathway marker lists 
obtained for four independent data sets were analyzed. The 
calculation of pathway activation scores is performed using 
the iPANDA algorithm and its versions with disabled gene 
grouping and/or topological weights. The ‘off state of 
topology coefficients means that they are equal to 1 for all 
genes during the calculation. Also, the ‘off state for the gene 
grouping means that all the genes are treated as individual 
genes. The application of the gene modules without topol- 
ogy-based coefficients reduces the robustness of the algo- 
rithm as well as the overall number of common pathway 
markers between data sets. Turning on the topology-based 
coefficients just slightly increases the robustness of the 
algorithm. Whereas using topology and gene modules 
simultaneously dramatically improves this parameter for 
both tissue types. This result implies that the combined 
implementation of the gene modules along with the topol- 
ogy-based coefficients serves as an effective way of noise 
reduction in gene expression data and allows one to obtain 
stable pathway activation scores for a set of independent 
data. 

PANDA biomarkers as classifiers for prediction models. 
High AUC values for the pathway markers shown in suggest 
that iPANDA scores may be efficiently used as classifiers for 
biological condition prediction challenges. 

In order to classify the samples as responders or non- 
responders, the random forest models were developed using 
iPANDA scores obtained for training sets of samples for 
each end point. Subsequently, performance of these models 
is measured using validation sets. Matthew's Correlation 
Coefficients (MCC), specificity and sensitivity metrics were 
applied to evaluate performance of the models. MCC met- 
rics were chosen for the ease to calculate and due to their 
informativeness even when the distribution of the two 
classes is highly skewed. The similar random forest models 
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were built using pathway activation (enrichment) scores 
obtained by other pathway analysis algorithms, including 
SPIA, GSEA, DART, ssGSEA and PLAGE. Moreover, to 
fully assess the performance of iPANDA-based paclitaxel 
sensitivity prediction models, we have trained the similar 
random forest models on four different gene expression 
subsets: expression levels of all genes (log GE), fold change 
for all genes between the training set and corresponding 
normals (log FC), expression levels of most differentially 
expressed genes (t-test P«0.05) (log DGE), and fold change 
in expression levels of most differentially expressed genes 
(t-test P<0.05) between the training and corresponding nor- 
mal breast tissue data sets (log DFC). Logarithmic scale is 
used for training the gene level models. All pathway-level 
and gene-level data is Z-score normalized separately for 
each GEO data set used. 

Application of the pathway activation measurement 
implemented in iPANDA leads to significant noise reduction 
in the input data and hence enhances the ability to produce 
highly consistent sets of biologically relevant biomarkers 
acquired on multiple transcriptomic data sets. Another 
advantage of the approach presented is the high speed of the 
computation. The gene grouping and topological weights are 
the most demanding parts of the algorithm from the per- 
spective of computational resources. Luckily, these steps can 
be precalculated only once before the actual calculations 
using transcriptomic data. The calculation time for a single 
sample processing equals B1.4 s on the Intel® Core 
13-3217U 1.8 GHz CPU (compared with 10 min for SPIA, 
4 min for DART, about 10 s for ssGSEA, GSEA and 
PLAGE). Thus, iPANDA can be an efficient tool for high- 
throughput biomarker screening of large transcriptomic data 
sets. 

The use of merely microarray data for pathway activation 
analysis has well-known limitations, as it cannot address 
individual variations in the gene sequence and consequently 
in the activity of its product. For example, a gene can have 
a mutation that reduces activity of its product but elevates its 
expression level through a negative feedback loop. Thus, the 
elevated expression of the gene does not necessarily corre- 
sponds with the increase in the activity of its product. 

Although the iPANDA algorithm is initially designed for 
microarray data analysis, it can also be easily applied to the 
data derived from genome-wide association studies 
(GWAS). In order to do so, GWAS data can be converted to 
a form amenable for the iPANDA algorithm. Single-point 
mutations are assigned to the genes based on their proximity 
to the reading frames. Then each single-point mutation is 
given a weight derived from a GWAS data statistical analy- 
sis40. Simultaneous use of the GWAS data along with 
microarray data may improve the predictions made by the 
¡PANDA method. 

One of the rapidly emerging areas in biomedical data 
analysis is deep learning. Recently several successful studies 
on microarray data analysis using various deep learning 
approaches on gene-level data have surfaced. Using path- 
way activation scores may be an efficient way to reduce 
dimensionality of transcriptomic data for drug discovery 
applications while maintaining biological relevant features. 
From an experimental point of view, gene regulatory net- 
works are controlled via activation or inhibition of a specific 
set of signaling pathways. Thus, using the iPANDA signal- 
ing pathway activation scores as input for deep learning 
methods could bring results closer to experimental settings 
and make them more interpretable to bench biologists. One 
of the most difficult steps of multilayer perception training 
is the dimension reduction and feature selection procedures, 
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which aim to generate the appropriate input for further 
learning. Signaling pathway activation scoring using 
iPANDA will likely help reduce the dimensionality of 
expression data without losing biological relevance and may 
be used as an input to deep learning methods especially for 
drug discovery applications. Using iPANDA values as an 
input data is particularly useful for obtaining reproducible 
results when analyzing transcriptomic data from multiple 
sources. 

The gene expression data from different data sets is 
preprocessed using GCRMA algorithm45 and summarized 
using updated chip definition files from Brainarray reposi- 
tory (Version 18) for each data set independently. 

Taken together, iPANDA demonstrates better perfor- 
mance for the noise reduction test in comparison to other 
pathway analysis approaches, suggesting its credibility as a 
powerful tool for noise reduction in transcriptomic data 
analysis. iPANDA ha strong ability to identify potential 
biomarkers (or pathway markers) of the phenotype under 
investigation. One of the commonly used methods to assess 
the capability of transcriptomic pathway markers to distin- 
guish between two groups of samples (for example, resis- 
tance and sensitivity to treatment) is to measure their 
receiver operating characteristics area under curve (AUC) 
values. The capacity to generate a high number of biomark- 
ers with high AUC values is a major requirement for any 
prospective transcriptomic data analysis algorithm to be 
used in prediction models. 

There are several widely used collections of signaling 
pathways including Kyoto Encyclopedia of Genes and 
Genomes (KEGG), QIAGEN and NCI Pathway Interaction 
Database. In this study, the collection of signaling pathways 
most strongly associated with various types of malignant 
transformation in human cells were used, obtained from the 
SABiosciences collection (sabiosciences.com/pathwaycen- 
tral.php). Using a senescence-specific pathway database can 
be used to ensure the presence of multiple pathway markers 
for the particular condition under investigation. Each path- 
way contains an explicitly defined topology represented as a 
directed graph. Each node corresponds to a gene or a set of 
genes while edges describe biochemical interactions 
between genes in nodes and/or their products. All interac- 
tions are classified as activation or inhibition of downstream 
nodes. The pathway size ranges from about twenty to over 
six hundred genes in a single pathway. 

The iPANDA approach for large-scale transcriptomic data 
analysis accounts for the gene grouping into modules based 
on the precalculated gene coexpression data. Each gene 
module represents a set of genes which experience signifi- 
cant coordination in their expression levels and/or are regu- 
lated by the same expression factors. Therefore the actual 
function for the calculation of the pathway p activation 
according to the proposed iPANDA algorithm consists of 
two terms. While the first one corresponds to the contribu- 
tion of the individual genes, which are not members of any 
module, the second one takes into account the contribution 
ofthe gene modules. Therefore the final function for obtain- 
ing a iPANDA value for the activation of pathway p, which 
consists of the individual genes i and gene modules j, has the 
following analytical form: 


iPANDA, =) Gp + 3 Mp 
i i 
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The contribution of the individual genes (Gip) and the 
gene modules (Mjp) is 15 computed as follows: 


Gip = ww, Ay-Ig(fe;) 


NES 


M jp = max(wi)- 


Ai Ig(fe;)) 


Here fci is the fold change of the expression level for the 
gene i in the sample 20 under study to the normal level 
(average in a control group). As the expression levels are 
assumed to be logarithmically normally distributed and in 
order to convert the product over fold change values to sum, 
logarithmic fold changes are utilized in the final equation. 
Activation sign Aip is a discrete coefficient showing the 
direction in which the particular gene affects the pathway 
given. It equals +1 if the product of the 25 gene i has a 
positive contribution to the pathway activation and -1 if it 
has a negative contribution. The factors wiS and wipT are 
the statistical and topological weights of the 


iPANDA 


p=), +), Mi, 
i i 


Gip = wh wh Aip Ig Sey) 


Lob 


M jp = max(wi)- 


Ap t Ig(fe;)) 


with gene i ranging from 0 to 1. The derivation procedure 
for these factors is described in detail in the subsequent 
sections. Since lg(fci) and Aip values can be positive or 
negative, the iPANDA values for the pathways can also have 
different signs. Thus positive or negative iPANDA values 
correspond to pathway activation or inhibition respectively. 

Obtaining Gene Importance Factors 

In order to estimate the topological weight (wipT), all 
possible walks through the gene network are calculated on 
the directed graph associated with the pathway map. The 
nodes of the graph represent genes or gene modules, while 
the edges correspond to biochemical interactions. The nodes 
which have zero incoming edges are chosen as the starting 
points of the walks and those which have zero outgoing 
edges are chosen as the final points. Loops are forbidden 
during walks computation. The number of walks Nip 
through the pathway p which include gene i is calculated for 
each gene. Then wipT is obtained as the ratio of Nip to the 
maximum value of Njp over all genes in the pathway: 


T Nip 


Wip = max(N jp) 


The statistical weight depends on the p-values which are 
calculated from group t-test for case and normal sets of 
samples for each gene. The method called p-20 value 
thresholding is commonly used to filter out spurious genes 
which demonstrate no significant differences between sets. 
However, a major issue with the use of sharp threshold 
functions is that it can introduce an instability in filtered 
genes and as a consequence in pathway activation scores 
between the data sets. Additionally, the pathway activation 
values become sensitive to an arbitrary choice of the cutoff 
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value. In order to address this issue, using a smooth thresh- 
old function is suggested. In the present study, the cosine 
function on logarithmic scale is utilized: 


0, p > Pmax 
logp - logPmin J) TEM 
logPmax — logPmin WA 


l, P S Prin 


w= (coz 


where pmin and pmax are the high and low threshold 
values. In this study p-value thresholds equal to 10-7 and 
10-1 respectively. For the threshold values given over 58% 
ofall genes pass high threshold and about 12% also pass low 
threshold for the data under investigation. Hence over 45% 
of the genes in the data set receive intermediate wiS values. 
Therefore more stable results for pathway activation scores 
between data sets can be achieved using this approach. 

Grouping Genes into Modules 

To obtain the gene modules, two independent sources of 
data were utilized: 10 human database of coexpressed genes 
COEXPRESdb18 and the database of the downstream genes 
controlled by human sequence-specific transcription fac- 
tors19. The latter is simply intersected with the genes from 
the pathway database used, while correlation data from 
COEXPRESdb is clustered using Euclidean distance matrix. 

Distances were obtained according to the following equa- 
tion: 


nF 1-corr; 


where corr,, is correlation between expression levels of 
genes i and j. DBScan and hierarchical clustering with an 
average linkage criteria were utilized to identify clusters. 
Only clusters with an average internal pairwise correlation 
higher than 0.3 were considered. Clusters obtained from the 
transcription factors database and coexpression database 
were recursively merged to remove duplicates. A pair of 
clusters is combined into one during the merging procedure 
if the intersection level between clusters had been higher 
than 0.7. As a result, a set of 169 gene modules which 
includes a total of 1021 unique genes is constructed. 

Statistical Credibility of the I PANDA Values 

The p-values for the iPANDA pathway activation scores 
are obtained using weighted Fisher’s combined probability 
test. 

Algorithm Robustness Estimation 

In order to quantitatively estimate the robustness of the 
algorithm between data sets, the Common Marker Pathway 
(CMP) index is introduced. The CMP 15 index is a function 
of the number of pathways considered as markers that are 
common between data sets. It also depends on the quality of 
the treatment response prediction when these pathways are 
used as classifiers. The CMP index is defined as follows: 


is 
CMP = 22 2, In(N;) x (AUC; — AUCR) 


where n is the number of data sets under study, Ni is the 
number of genes in the pathway i and AUCjj is the value of 
the ROC area under curve which shows the quality of the 
separation between responders and non-responders to treat- 
ment when pathway 1 is used as classifier for the j-th data set. 
AUCR is the AUC value for a random classifier and equals 
to 0.5. A pathway is considered as a marker if its AUC value 
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is higher than 0.8. The In(Ni) term is included to increase the 
contribution of the larger pathways because they have a 
smaller probability to randomly get a high AUC value. The 
higher values of the CMP index correspond to the most 
robust prediction of pathway markers across the data sets 
under investigation, while zero value of CMP index corre- 
sponds to the empty intersection ofthe pathway marker lists 
obtained for the different data sets. 

Clustering of Data Samples 

In order to apply iPANDA to the Paclitaxel treatment 
response prediction over a several independent data sets, the 
pathway activation values were normalized to the Z-scores 
independently for each data set. The expected values used 
for the Z-scoring procedure were adjusted to the number of 
responders and non-responders in the data set under study. 
The pairwise distance matrix between samples utilized for 
further clustering is obtained using the 


12. , 
Dy = | = > (PANDA;, - iPANDA jp” 


Here Dij is the distance between samples i and j, N is the 
number of the pathway markers used for the distance 
calculation. iPANDAip and iPANDAip are the normalized 
iPANDA values for the pathway p for the samples i and j 
respectively. Normalization of iPANDA values to the 
Z-scores implies that all the considered pathway markers 
have an equal contribution to the distance obtained. All 
distances were converted into similarities (1-Dij) before the 
clustering procedure. Hierarchical clustering using Ward 
linkage is performed on the distance matrix to divide the 
samples into groups. 

Transcriptome (Gene Expression) Difference 

In a preferred embodiment, two iPANDA transcriptome 
signatures, one from a senescent patient tissue or organ to be 
treated (or similar proxy profile) and another representing a 
target, nonsenescent tissue or organ, are compared to 
Observe transcriptome (gene expression) differences. Prin- 
cipal component analysis is typically applied. Gene expres- 
sion trees, difference matrices matrix may also be use, as is 
known in the art, for example using techniques know in the 
art. In a preferred embodiment, a difference matrix provides 
the vector inputs for a machine learning architecture as 
described below. 

In a preferred embodiment, gene expression patterns are 
subjected to Principal Component Analysis (PCA). In an 
embodiment wherein many different tissue samples are 
taken, rather than just two, several clusters are formed, 
suggesting related biological functions for these clusters. 
For example, the gastrointestinal tissues, esophagus, rectum 
and colon all grouped together, and hematopoietic tissues 
(bone marrow and spleen) and monocytes also clustered. 
Because transcriptomes of functionally related cell types 
often exhibit substantial hierarchical structure a neighbor- 
joining gene expression tree can be generated based on mean 
gene expression levels. Similar to the PCA results, bone 
marrow and spleen clustered with monocytes, while skeletal 
muscle and heart muscle grouped together and were distinct 
from smooth muscle. Thus, for any given cell type, e.g., a 
neuron, epigenetic marks reflect both the prior (e.g., state in 
the germ layer and derived cell lineages) and present regu- 
latory landscapes. 

Differential Gene Expression of Cells and Tissues 

In heart and skeletal muscle, 455 out of 12,044 genes are 
differentially expressed (phylogenetic analysis of variance 
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(ANOVA) P valuex0.01) compared with other cells and 
tissues. Approximately 44% of these genes were associated 
with the tricarboxylic acid (TCA) cycle and respiration, in 
agreement with the metabolic organization and energy 
sources of these tissues. 

Neurons, which are critical for cognitive and motor func- 
tions, have cell lifespans that likely exceed the lifespan of 
the organism. Comparing neurons to shorter-lived cells and 
tissues is conceptually similar to comparing gene expression 
of long-lived mammals to related short-lived species, e.g., 
examining African mole rats against other rodents.15 
Accordingly, neurons should possess a gene expression 
signature associated with low turnover/long lifespan, in 
addition to the patterns indicative of neuronal function. Out 
of 12,044 genes 1,438 were differentially expressed in 
neurons (P=0.01) and gene set enrichment analysis showed 
enrichment for functions associated with lysosomes, protea- 
somes, ribosomal proteins and apoptosis. Neurons presented 
with reduced expression of 27 ribosomal proteins and mul- 
tiple 20S proteasome subunit genes, consistent with distinct 
protein metabolism required to fine-tune self-renewal and 
synaptic plasticity. This group of genes was not correlated 
with cell and tissue turnover, suggesting that this expression 
pattern is unique to long-lived neurons. Reduced protein 
metabolism, which may be induced by dietary restriction 
and other interventions, is known to associate with extended 
lifespan in a number of model organisms. Furthermore, 
expression of the tumor suppressor p53 (TP53) was signifi- 
cantly reduced (P<0.001) in neurons, where it was expressed 
at a level gene expression pattern of cell and tissue turnover. 

Inputs to Machine Learning Platform and iPANDA 

In a preferred embodiment, a general design of the 
computational procedures that outputs drug classification of 
the invention are in four sequential steps: 1) transcriptomic 
similarity search, 2) protein target based search, 3) structural 
similarity based search, 4) transcriptomic signature screen- 
ing and 5) deep neural network based search. 

Regarding (1) In silico Pathway Activation Network 
Decomposition Analysis (iPANDA), can be applied to tran- 
scriptomic tissue-specific aging datasets obtained from Gene 
Expression Omnibus (GEO) with total number of samples 
not less than 250 for each tissue. Tissue-specific cellular 
senescence pathway marker sets are identified. Only path- 
ways considerably perturbed in senescent cells (pathways 
with iPANDA-generated p-values less than 0.05 are consid- 
ered as pathway markers). iPANDA scores are precalculated 
for Broad Institute LINCS Project data and were utilized for 
calculating transcriptomic compound similarity. Euclidian 
or other similarity between vectors of iPANDA scores for 
senolytics and other compounds of interest are calculated 
using data on cell lines for corresponding tissue. Only 
previously identified tissue-specific pathway markers were 
used for similarity calculation. 

Regarding 2) Using LINCS Project data on knockdown 
cell lines the same procedure are performed to identify key 
target genes involved in the action of previously identified 
senolytic compounds D (Dasantinib), N (Navitoclax) and Q 
(Quercetin). The list of target genes 1s enriched by proteins 
likely to interact with these compounds using STITCH 
human drug-target interaction database. Pharmacophore- 
based search and publicly available docking algorithms are 
applied to identify the compounds which specifically bind 
the identified targets with highest affinity. 

3) Structural similarity search is performed for three 
compounds already known to have senolytic properties 
(D,N,Q). Using publicly available molecular docking algo- 
rithms the importance weights for chemical groups were 
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defined. This information is utilized for QSAR-based struc- 
ture generation and filtering. Compounds from pubchem 
database can also be screened during the similar procedure 
in order to find structural analogues of D,N and O. 

4) To investigate potential effects of natural compounds 
without known molecular targets GEO and LINCS Project 
gene expression data are used. In both databases, datasets 
can be examined, consisting of transcriptomes of cell lines 
before and after treatment with multiple different chemical 
compounds. For aging datasets scoring exactly the same 
GEO datasets GSE66236, GSE69391, GSE18876, 
GSE21779, GSE38718, GSES9980, GSE52699, GSE48662 
are used. It can be assumed that an anti-aging compound 
would affect an aged transcriptome to turn it into *younger" 
state. Mechanistically, this reflected a fact that if a certain 
regulatory pathway is increased (or decreased) with aging, 
its end targets would increase (or decrease) expression with 
aging. By searching for compounds which decrease (or 
increase) the expression of those end targets, the drugs 
which target these aging-associated pathways (some of its 
master regulators) could be discovered. 

First, differentially expressed genes associated with aging 
are found, as well as differentially expressed genes after 
drug treatment. For microarray-based transcriptome data, a 
limma test of differential gene expression is used. Each set 
of differentially expressed genes is ordered accordingly to 
the following measure which takes into account both mag- 
nitude and statistical significance of the effect: FC max(0, 
-log(pvalue)), where PC is fold-change of gene expression 
between groups and pvalue represents the result of limma 
test. 

A statistically motivated score estimating anti-aging abili- 
ties of a compound is designed. A significantly up- or 
down-regulated gene were defined as the ones with 
FDR<0.01 (after multiple-testing correction). A Fisher exact 
test is performed which measured the association of two 
characteristics of each gene: being significantly downregu- 
lated after the drug treatment and being significantly upregu- 
lated during aging. Vice versa, the same test is performed for 
significantly upregulated genes after the drug treatment 
versus significantly downregulated genes during aging. The 
best of p-values of those two tests are taken as a score for the 
given drug against aging. A multiple testing correction of the 
obtained p-values for the amount of compound under study 
can be performed. The same methodology is applied for 
screening natural compounds within LINCS transcriptomic 
database that are similar to the effects of other drugs, such 
as metformin. 

5) The deep neural network based classifier of compound 
pharmacological class can be trained on many compounds. 
Training data included structural data (QSAR, SMILES), 
transcriptomic response LINCS Project data on gene-level 
and pathway level (PANDA) and drug-target interaction 
network from STITCH database. The specific class of pro- 
spective senolytic compounds is declared during training. 
This class included compounds identified on the steps 1, 2, 
3 of the study. 

Established classifier accuracy is recorded after the class- 
balancing of the test 1 O set. A list of senolytic compounds 
after scanning the database of 300000+ compounds is 
obtained for further analysis. Top ranking compounds are 
obtained on each of the steps and intersection is found for 
each tissue independently. As a result compounds are iden- 
tified as having the best senolytic properties for the tissue. A 
set of structural analogues according to the procedure in step 
3 is obtained, which possess similar molecular properties, 
and likely senolytic properties. 
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6) Finding structural analogs of desired molecules. An 
aim also is to find structural analogs of molecule of interest 
for protein-ligand interaction. This approach is highly effi- 
cient for increasing the specificity of binding with targets 
(proteins). 

At the first step we provide an analysis of possible targets 
for the drug compounds. This can be done in two ways: 1) 
using specific programs for searching in databases for dif- 
ferent interactions of molecules of interest with proteins/ 
genes (e.g. STITCH); 2) article analysis of an experimental 
data. In the case of a molecule chosen the second way as it 
helps to select the best variants of experimentally approved 
protein-ligand interactions. From literature analysis n targets 
are chosen according to parameters: 1) specific binding of 
target with drug(s); 2) the lowest IC50; 3) the presence of the 
structure in protein data bank. 

After that for all of the structures we applied docking for 
all possible active sites and additional pockets of binding. 
The best positions of drugs in target were chosen and after 
an additional docking is done with the usage of algorithm of 
flexible chains. 

Then all the structures of the target were analyzed accord- 
ing to algorithm: 1) amount of hydrogen bonds 2) hydro- 
phobic/hydrophilic interactions 3) number n-n interactions. 
This information were used further to understand the key 
principles by which molecule can bind into the specific site 
of the target. According to such analysis one can find the 
rules for a molecule to be modified in for better binding 
properties with specific target. With the usage of the soft- 
ware the analogs are found according to the rule for the 
molecule. After that toxicology in silico test are provided 
with choosing non-toxic analogs. These new non-toxic ana- 
logs were again docked into the binding site of the target for 
interactions analysis and those which showed the best score 
results are selected as most promising and perspective ones. 
Other structural analogs and conformers can be extracted 
from the Pubchem Database. 

Ina preferred embodiment, a deep neural network, similar 
to that described in, for example, Aliper et. al., “Deep 
learning applications for predicting pharmacological prop- 
erties of drugs and drug repurposing using transcriptomic 
data”, Mol Pharm, 2016 Jul. 5; 13(7): 2524-2530, and 
Mamoshina et. al., “Applications of Deep Learning in Bio- 
medicine”, Mol Pharm, 2016 March 13(5), is used, in 
combination with a cellular signature database such as the 
LINCS database and a drug therapeutic use database such as 
MeSH, as inputs to the DNN in order to output drug 
classifications to develop a therapeutic protocol, in this case 
to categorize and choose drugs for a senescence or other 
treatment protocol. LINCS is the US Library of Network- 
Based Cellular Signatures Program aims to create a net- 
work-based understanding of biology by cataloging changes 
in gene expression and other cellular processes that occur 
when cells are exposed to a variety of perturbing agents. 
MeSH is (Medical Subject Headings) is the US National 
Library of Medicine controlled vocabulary thesaurus used 
for indexing articles for PubMed, the free search engine of 
references and abstracts on life sciences and biomedical 
topics also from the US National Library of Medicine. 

An AAE works by matching the aggregated posterior to 
the prior ensures that generating from any part of prior space 
results in meaningful samples. As a result, the decoder ofthe 
adversarial autoencoder learns a deep generative model that 
maps the imposed prior to the data distribution. An AAE can 
be used in applications such as semi-supervised classifica- 
tion, disentangling style and content of images, unsuper- 
vised clustering, dimensionality reduction and data visual- 
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ization. AAEs are used, for example, in generative modeling 
and semi-supervised classification tasks. Thus an AAE turns 
an autoencoder into a generative model. The AAE is often 
trained with dual objectives—a traditional reconstruction 
error criterion, and an adversarial training criterion that 
matches the aggregated posterior distribution of the latent 
representation of the autoencoder to an arbitrary prior dis- 
tribution. 

In a preferred embodiment derived from Kadurin, the 
method uses a 7-layer AAE architecture with the latent 
middle layer serving as a discriminator. As an input and 
output the AAE uses a vector of binary fingerprints and 
concentration of the molecule. In the latent layer we also 
introduced a neuron responsible for growth inhibition per- 
centage, which when negative indicates the reduction in the 
number of tumor cells after the treatment. To train the AAE 
one uses a cell line assay data for compounds profiled in a 
cell line. The output of the AAE can then be used to screen 
drug compounds, such as the 72 million compounds in 
PubChem, and then select candidate molecules with poten- 
tial anti-sensecent or properties. 

The latest class of non-parametric approaches for deep 
generative models is known as generative adversarial net- 
work (GAN). In this new framework, initially proposed by 
Goodfellow, generative models are estimated via an adver- 
sarial process. In practice, two models are simultaneously 
trained: a generative model G that captures the data distri- 
bution, and a discriminative model D that estimates the 
probability that a sample came from the training data rather 
than G. The training procedure for G is to maximize the 
probability of D making an error. Thus, this framework does 
not correspond to the standard optimization problem as it is 
based on a value function that one model seeks to maximize 
and the other seeks to minimize. The process terminates at 
a saddle point that is a minimum with respect to one model's 
strategy and a maximum with respect to the other model's 
strategy. Because GANs do not require an explicit repre- 
sentation of the likelihood, neither approximate inference 
nor Markov chains are necessary. Consequently GANs 
provide an attractive alternative to maximum likelihood 
techniques. 

Generative capabilities of deep adversarial network tech- 
niques open the doors to new perspectives as it could 
contribute to overcome several limitations of current data 
driven computational methods. For example, we can apply 
GANS on transcriptomics data for the generation of new 
samples for a desired phenotypic groups and in chemoin- 
formatics for the prediction of the physical, chemical, or 
biological properties and structures of molecules. Quantita- 
tive structure-activity relationships (QSAR) and quantitative 
structure-property relationships (QSPR) are still considered 
as the modern standard for predicting properties of novel 
molecules. To that end, many ML-based approaches have 
been developed to tackle such problems, but recent results 
show that the DL-based methods match or outperform other 
state-of-the-art methods and demonstrate better predictive 
performance, parsimony and interpretability and web-based 
predictors are available on some cases. Furthermore, new 
methods based on convolutional neural networks are able to 
perform predictions by directly using graphs of arbitrary size 
and shape as inputs rather than fixed feature vectors and one 
can expect to see the development of more flexible deep 
generative architectures that can be applied directly to other 
structured data such as sequences, trees, graphs, and 3D 
structures. Thus, the deep adversarial network techniques 
could be used to improve accuracy, generative capabilities 
and predictive power and address several issues including 
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computational cost, limited computation at each layer and 
limited information propagation across the graph. 

Target prediction and mapping of bioactive small com- 
pounds and molecules by analyzing binding affinities and 
chemical properties is another area of research that makes 
extensive use of data-driven computational methods in order 
to optimize the use of data available in existing repositories. 
Despite promising results and the availability of web-plat- 
forms to computationally identify new targets for unchar- 
acterized molecules or secondary targets for known mol- 
ecules such as SwissTargetPrediction, in general, the 
available methods remain too inaccurate for systematic 
binding predictions and physical experiments remain the 
state of the art for binding determination. In this field, 
DL-based methods, such as the recently released methods 
AtomNet based on deep convolutional neural networks have 
allowed to circumvent several limitations and outperform 
more traditional computational methods including RFs, 
SVMs for QSAR and ligand-based virtual screening. One 
can expect that the development of DL-methods making use 
ofthe GAN framework will also lead to significant improve- 
ment with respect to prediction accuracy and power. 

In a preferred embodiment, the adversarial network and 
the autoencoder are trained jointly with SGD in two 
phases—the reconstruction phase and the regularization 
phase—executed on each mini-batch. In the reconstruction 
phase, the autoencoder updates the encoder and the decoder 
to minimize the reconstruction error of the inputs. In the 
regularization phase, the adversarial network first updates its 
discriminative network to tell apart the true samples (gen- 
erated using the prior) from the generated samples (the 
hidden codes computed by the autoencoder). The adversarial 
network then updates its generator (which is also the 
encoder of the autoencoder) to confuse the discriminative 
network. Once the training procedure is done, the decoder of 
the autoencoder will define a generative model that maps the 
imposed prior of p(z) to the data distribution. 

In a preferred embodiment, the input layer is divided into 
a fingerprint part and a concentration input neuron. In a 
preferred embodiment, an AAE is trained to encode and 
reconstruct not only molecular fingerprints, but also experi- 
mental concentrations. The Encoder consists of two conse- 
quent layers L1 and L2 with 128 and 64 neurons, respec- 
tively. The decoder consists of the two layers L'1 and L'2, 
comprising 64 and 128 neurons respectively. The latent layer 
consists of 5 neurons, one of which is the GI and the four 
others are discriminated with normal distribution. Since we 
train an encoder net to predict “efficiency” against “senes- 
cence' in a single neuron of latent layer, we divide the latent 
vector in two parts—‘GI’ and ‘representation’. So we added 
a regression term to the encoder cost function. Furthermore, 
we restrict our encoder to map the same fingerprint to the 
same latent vector independently from input concentration 
by additional ‘manifold’ cost. Here we compute mean and 
variance of the concentrations through all dataset and then 
use them to sample concentrations for *manifold' step. On 
each step we sample fingerprint from the training set and 
batch of concentration from normal distribution with given 
mean and variance. The training net with ‘manifold’ loss is 
performed by maximization of cosine similarity between 
*representations' of similar fingerprints with different con- 
centrations 

All these changes resulted in a 5-step train iteration 
instead of a 3-step in AAE basic model: (a) Discriminator 
trained to distinguish between given latent distribution and 
encoded 'representation', (b) Encoder trained to confuse 
Discriminator with generated ‘representations’, (c) Encoder 
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and Decoder trained jointly as Autoencoder; (d) Encoder 
trained to fit “score” part of latent vector; (e) Encoder trained 
with “manifold” cost. 

The two first steps (a,b) are trained as usual adversarial 
networks. The Autoencoder cost function is computed as a 
sum of logloss of fingerprint part and mean squared error 
(MSE) of concentration parts and MSE is also used as a 
regression cost function. Example code for a preferred AAE 
is available at github.com/spoilt333/onco-aae. 

Experimental/Simulations/Models 

1. Single Biopsy (or Existing Individual Profile). 

Single biopsy test of liver or lung is taken from the patient 
according to standard procedures in medical center as 
described in the nhlbi.hih.gov website. For a lung biopsy, 
few samples of lung tissue from several places in lungs will 
be taken. The samples are examined under a microscope, 
transcriptome and gene expression profiles are also ana- 
lyzed. This procedure can help rule out other conditions, 
such as sarcoidosis, cancer, or infection. Lung biopsy also 
can show how far disease has advanced. 

There are several procedures to get lung tissue samples. 

Video-assisted thoracoscopy. This is the most common 
procedure used to get lung tissue samples. An endoscope is 
inserted with an attached light and camera into chest through 
small cuts between ribs. The endoscope provides a video 
image of the lungs and allows to collect tissue samples. This 
procedure must be done in a hospital. 

Bronchoscopy. For a bronchoscopy, a thin, flexible tube 
through is passed in nose or mouth, down a throat, and into 
airways. At the tube's tip are a light and mini-camera. They 
allow to see windpipe and airways. Then a forceps is 
inserted through the tube to collect tissue samples. 

Bronchoalveolar lavage. During bronchoscopy, a small 
amount of salt water (saline) is injected through the tube into 
lungs. This fluid washes the lungs and helps bring up cells 
from the area around the air sacs. These cells are examined 
under a microscope. 

Thoracotomy. For this procedure, a few small pieces of 
lung tissue are removed through a cut in the chest wall 
between ribs. Thoracotomy is done in a hospital. 

For a liver biopsy, few samples of liver tissue from several 
places in liver will be taken. The samples are examined 
under a microscope, transcriptome and gene expression 
profiles are also analyzed. 

There are several procedures to get live tissue samples. 

Percutaneous Liver Biopsy. The health care provider 
either taps on the abdomen to locate the liver or uses one of 
the following imaging techniques: ultrasound or computer- 
ized tomography (CT) and will take samples with the needle. 

Transvenous Liver Biopsy. When a person's blood clots 
slowly or the person has ascites—a buildup of fluid in the 
abdomen—the health care provider may perform a trans- 
venous liver biopsy. A health care provider applies local 
anesthetic to one side of the neck and makes a small incision 
there, injects contrast medium into the sheath and take an x 
ray. After this insert and remove the biopsy needle several 
times if multiple samples are needed. 

Laparoscopic Liver Biopsy. Health care providers use this 
type of biopsy to obtain a tissue sample from a specific area 
or from multiple areas of the liver, or when the risk of 
spreading cancer or infection exists. À health care provider 
may take a liver tissue sample during laparoscopic surgery 
performed for other reasons, including liver surgery. 

2. Pathway Signature Measurement 

Transcriptomic Data: 

From the GEO database (ncbi.nlm.nih.gov/geo/) data sets 
containing gene expression data related to IPF patients and 
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normal healthy lung tissue used as a reference were down- 
loaded (21 data sets). IPF and normal data from different 
data sets was preprocessed using GCRMA algorithm and 
summarized using updated chip definition files from Brain- 
array repository for each data set independently. 

Differential genes were calculated using limma and 
deseq2 algorithms for groups of comparison: IPF (IPF vs 
reference healthy lung tissue); Senescence (old vs reference 
young healthy lung tissue); Smoking (current smoker vs 
reference non-smoker); Age status data was available for 2 
data sets and smoking status data was available for 1 data 
set. 

Differential expression genes data was used as an input for 
iPANDA algorithm in order to measure the pathway signa- 
ture of each comparison group. 

Pathway Database Overview: 

There are several widely used collections of signaling 
pathways including Kyoto Encyclopedia of Genes and 
Genomes, QIAGEN and NCI Pathway Interaction Database. 
In this study, we use the collection of signaling pathways 
most strongly associated with various types of malignant 
transformation in human cells obtained from the SABiosci- 
ences collection (sabiosciences.com/pathwaycentral.php). 

3. Compare Signature Profiles. 

Signature profile for each comparison group can be con- 
structed based on iPANDA p-values cut-off (p-value<=0.05) 
and common overlap among different data sets: intersection 
cut-off threshold equal to 15 was used for IPF data, 2 for 
senescence data and 1 for smoking data. 

4. Personalize the Treatment. 

DNNs can be used as a tool to predict active compounds 
and generate a compounds with a desired efficacy. The 
application of DNN-based models can be used for person- 
alization of compounds for individual patients and evalua- 
tion of the treatment efficacy and safety. 

Machine learning approaches provide the tools of the 
analysis of biomedical data without prior assumption on the 
functional relations of this data. And Deep Neural Network 
(DNN) based approaches, such as multi-layered feed for- 
ward neural networks, are able to fit the complex and sparse 
biomedical data and learn highly non-linear dependencies of 
the raw data without the modification of features of interest. 
And deep learning 1s a state of the art method for many task 
from machine vision to language translation. But despite the 
fact, that biomedicine entered the era of “big data", bio- 
medical datasets are usually limited by sample sizes. And 
feature selection and dimensionality reduction of the feature 
space usually increase the predictive power of the DNNs 
applied in the biomedical domain (Aliper, Plis, et al. 2016). 

A system can be provided that utilizes quantitative models 
with a deep architecture that is able to stratify compounds by 
their efficacy for the individual patient based his or her 
personal profile. In part, the personal profile can include the 
biological pathways analyzed with the quantitative models. 
The following data could be used as input feature to the 
system: gene expression profiles and signaling pathway 
profiles, blood tests (Putin et al. 2016), protein expression 
profiles, clinical history as well as a deep representation of 
the electronic health record (Miotto et al. 2016). 

A system can be provided that utilizes the quantitative 
models with a deep architecture that 1s able to evaluate the 
efficacy of the proposed treatment through the quantitative 
assessment of the health status of the patient, such a bio- 
logical age, life expectancy, the probability of survival. The 
following data could be used as input feature to the system: 
gene expression profiles and signaling pathway profiles, 


US 10,325,673 B2 


37 


blood tests, protein expression profiles, clinical history as 
well as a deep representation of the electronic health record. 

A system can be provided that utilizes the quantitative 
models with a deep architecture that is able to predict 
potential side effect of the treatment. The following data 
could be used as input feature to the system: gene expression 
profiles and signaling pathway profiles, blood tests, protein 
expression profiles, clinical history as well as a deep repre- 
sentation of the electronic health record. 

A system can be provided based on generative model with 
deep architecture (Kadurin et al. 2017) that is able to 
generate molecules with a desired properties, such as high 
eflicacy, low toxicity, high bioavailability and the like. 
Generated molecules can be evaluated by the DNN based 
systems through the eflicacy and safety prediction. 

Accordingly, a 5R strategy as described herein can be 
applied to patients with pre-senescent, senescent and fibrotic 
conditions. 5R strategy includes: Rescue; Remove; Replen- 
ish; reinforce; and Repeat 

Stage 1. Rescue. 

The first step of 5R strategy is rescuing pre-senescent 
cells in a particular tissue (including liver and lungs). 
Pre-senescent phenotype is considered potentially revers- 
ible. In order to rescue the cells demonstrating pre-senescent 
phenotype the specific set of possible interventions shall be 
applied. These interventions include the treatment with the 
one senoremediator compound or a combination of the 
senoremediator compounds from the list herein. Senoreme- 
diator compounds should be administered orally, by injec- 
tion, sublingually, buccally, rectally, vaginally, cutaneously, 
transdermally, ocularly, oticly or nasally or any other way. 

Stage 2. Remove. 

This step is performed to eliminate the cells that already 
entered the irreversible senescent state. Senescent cells lose 
their function and possess a constant danger to the surround- 
ing cells as described above. Elimination of such cells may 
prevent surrounding cells to enter the senescent phenotype 
by positive loop and restore the normal tissue functioning. In 
order to eliminate the cells demonstrating senescent pheno- 
type the specific set of possible interventions shall be 
applied. These interventions include the treatment with the 
one senolytic compound or a combination of the senolytic 
compounds from the list below. Senolytic compounds 
should be administered orally, by injection, sublingually, 
buccally, rectally, vaginally, cutaneously, transdermally, 
ocularly, oticly or nasally or any other way. 

Stage 3. Replenish. 

The second step leads to the general rejuvenation of the 
cells in the population, but on the other hand, to the 
reduction in the total cell count. This allows for the further 
replenish step to be used for repopulation of the tissue with 
functional cells. Therefore the pool of stem/progenitor cells 
in a particular tissue (including mesenchymal and epithelial 
stem cells in lungs, liver) should be activated in order to 
replenish the tissue. The possible interventions needed to 
achieve that goal include the treatment with the one specific 
compound or a combination of the compounds from the list 
below. Importantly the compounds should stimulate the 
proliferation of the stem cells, but on the other hand prevent 
the unwanted effects related to the possible uncontrolled 
proliferation and subsequent malignant transformation. The 
compounds should be administered orally, by injection, 
sublingually, buccally, rectally, vaginally, cutaneously, 
transdermally, ocularly, oticly or nasally or other method. 

Stage 4. Reinforce. 

This step is used to prevent the further potential degra- 
dation of the tissue (or organ). It may include the treatment 
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with the one specific compound or a combination of the 
compounds from the list below. These compounds should 
demonstrate one of the following activities: immunomodu- 
lation in order to prevent possible malignant transformation 
and the accumulation of the senescent cells, cytoprotection 
in order to retain the functional state of the tissue, stimula- 
tion of the macrophages in order to achieve the specific state 
of senophagy (ability to specifically engulf and digest senes- 
cent cells). The compounds should be administered orally, 
by injection, sublingually, buccally, rectally, vaginally, cuta- 
neously, transdermally, ocularly, oticly or nasally or other 
method. 


Stage 5. Repeat. 


The whole multi-stage longevity therapeutics pipeline 
(stages 1-4) can be applied recurrently. The period between 
the therapies is defined individually on the tissue (organ)- 
specific basis and may vary from 1 month to 10 years. 


In an embodiment, the first four steps Rescue; Remove; 
Replenish; Reinforce can be used as a multi-stage longevity 
therapeutics pipeline and can be applied more than once, and 
on an ongoing basis. The period between the therapies is 
defined individually on a tissue, organ, and patient specific 
basis and may vary from 1 month to 10 years between 
treatments, or may essentially be continually ongoing, for 
some or all of the steps. 


EXAMPLES 


The invention includes methods, system, drugs, appara- 
tus, computer program product, among others, to carry out 
the following. 


FIG. 3 illustrates a transcriptomic clock method for accu- 
racy of biological aging assessment, compatible with the 
current invention. The correlation between actual chrono- 
logical age (x-axis) with predicted age (y-axis) for healthy 
individuals using the validation set. The grey line represents 
the linear regression decision boundary line. Values for r, R2 
and p-value are provided at the top of the figure. Note that 
the term DiseaseO in this and other figures simply means 
healthy/control subjects were used for such biological aging 
assessment. 


FIG. 4 illustrates the performance of age predicting mod- 
els (A) Actual chronological age vs. predicted age for Deep 
Feature Selection Model (DFS) on validation and testing 
sets. The grey line represents the linear regression decision 
boundary line. Values for R2 and MAE are provided at the 
bottom of the figure. 


FIG. 5 illustrates the performance of age predicting model 
trained on the microarray data on the external validation set 
of RNAseq data. The correlation between actual chrono- 
logical age group (x-axis) with predicted age (y-axis) for 
healthy individuals using the external validation set. Mean 
of the actual chronological age group vs. predicted age for 
the Deep Feature Selection Model (DFS). 


FIG. 6 illustrates distribution on number of samples by 
age for healthy individuals in the validation set. Blue 
(darker) and green (lighter) values are actual chronological 
age and assigned biological ages, respectively. For relatively 
healthy people, not surprisingly, assigned biological is close 
to chronological age. 


FIG. 7 illustrates an example epsilon-prediction accuracy 
for healthy individuals. The epsilon-prediction accuracy is 
defined as follows: 


US 10,325,673 B2 


39 


N 
Ir 


. vus ¡El 
:e — prediction = N 


Where f, is the predicted value, 1, is an indicator function 
with AE[y,-8; y;+e] 

For example, if epsilon=0 and yi=45, the DNN correctly 
recognizes this sample if the prediction of the sample 
belongs to the interval. 

FIG. 8 is a plot illustrates clustering using t-SNE cluster- 
ing algorithm by age for healthy individuals. Color bar 
indicates the age of the sample. For this particular example, 
there are no clearly defined clusters of healthy individuals by 
age. 


Example 1 


Age Prediction Models as a Target Identification Tools 

FIG. 9 illustrates the list of selected targets based on the 
importance ranking provided by the deep transcriptomic 
clocks and other machine learning methods. In the present 
study, we explore several methods to evaluate the impor- 
tance of features (genes) on age prediction. Genes were 
ranked by four methods: differential expression analysis, 
linear regression with elastic regularization (ElasticNet; 
genes ranked by absolute values of their regression coeffi- 
cients for a model), Random Forest (Gini importance value 
of each gene). Next, we explored the relative importance 
values assigned to genes by the Deep Feature Selection 
model, averaging the importance values of genes for the 
five-fold cross validation process. 

In addition to feature importance ranking, we also 
explored the wrapper method, which we have successfully 
applied previously in the context of identifying the most 
important blood markers for age prediction (Putin et al., 
2016; Mamoshina et al., 2018). We applied the same tech- 
nique in the present study, with some modification. Here we 
explored random permutations of vectors of gene expression 
values along with increased (by log 2 fold changes of 3) and 
decreased (log 2 fold changes of -3) gene expression values. 

In case of random permutations, x';=rand (x), where x is 
a vector of expression of i gene. 

In case of a direct increase or decrease, x';j-xx2/, where x 
is a vector of expression of i gene and f is a fold change of 
3 and -3 respectively. 

Therefore feature importance value for the gene 1 is 
calculated as, 


TABLE B 
Gene Name Ensembl gene ID David Gene Name 
ACACB ENSG00000076555 acetyl-CoA carboxylase 
beta(ACACB) 
ADORA2B ENSG00000170425 adenosine A2b 
receptor(ADORA2B) 
AKIRIN2 ENSG00000135334 akirin 2(AKIRIN2) 
AMACR ENSG00000242110 alpha-methylacyl-CoA 
racemase(AMACR) 
ANKRD54 ENSG00000100124 ankyrin repeat domain 
54(ANKRD54) 
ARFGAP3 ENSG00000242247 ADP ribosylation factor 
GTPase activating protein 
3(ARFGAP3) 
ARHGAP26 ENSG00000145819 Rho GTPase activating protein 
26(ARHGAP26) 
BAIAP2 ENSG00000175866 BAI1 associated protein 
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where Y is a vector of predicted value of age and Y' is a 
vector predicted values of age after permutations, k is a 
number of cross-validation folds and, in this case, equals to 
5. 

We used Support Vector Machine algorithm as an age 
predicting model. Each model predicts age after a modifi- 
cation of gene expression values and assigns an importance 
coefficient to the gene based on the accuracy of age predic- 
tion. Afterwards, scores obtained on the validation sets are 
summed, and each gene-associated importance factor is 
averaged to yield a final value. 

Borda count algorithm was applied to summarize all six 
ranks derived from age predicting models, and the rank of 
genes sorted by absolute log 2 fold change values derived 
from differential expression analysis, in order to obtain the 
final importance rank of genes. 

Table A provides 49 genes that are determined to be 
significantly important, in a preferred embodiment, for age 
prediction grouped by disease and molecular function cat- 


egory. 
TABLE A 


Category List of genes in each category 


Metabolism and energy 
homeostasis 


ACACB, SCD, ALDOC, SMOX, 
AMACR, HTRA1, ARG1, HLCS, 
HSD3B7, PECI 

PTGDS, HPGD, NT5E, TMSB4Y, 
ADORA2B, ACTNI, SNTB2. 

NETO2, GRM2, CACNAII, NRCAM, 
CCT5, BAIAP2, QPRT, TMEMIS, 
PPP1R9B, 

TOP1MT, PARP3, NOTCHI, TAF7, 
TINF2, CHTOP, CTBP1, CBX7, RRPI, 
RNF144, PNPT1, C160rf42 
ADORA2B, SOD1 

HTRA1 


Hypertension and hypoxia 


Neuropathy 


Genomic stability 


Smooth muscle construction 
Age-related macular 
degeneration 

Tumor angiogenesis CD248, VASH1, SERTAD3, TNFSF8, 
YWHAE, CRK, CBLL1, CDCA7L, E2F4 
AKIRIN2, DEFB123, PLXNCI, 
PSMD12, RELA 


Inflammation 


Table B lists of 100 gene names and abbreviations, all 
human, used for transcriptome clock analysis in a preferred 
embodiment. 


Species 


Homo sapiens 


Homo sapiens 


Homo sapiens 
Homo sapiens 


Homo sapiens 


Homo sapiens 


Homo sapiens 


Homo sapiens 


Gene Name 


BET1 
BPNT1 
C16orf42 
C17orf48 
Clorf77 
C9orf91 
CACNAII 
CBLL1 


CBX7 
CCT5 


CD248 
CDCA7L 


CDK6 


CLDN14 
CLIC3 


COBRAI 


CRK 


CTBP1 


DAPP1 


DBNDD2 


DEFB123 
DERPC 


DHTKD1 


E2F4 


FANCL 


FLJ10374 


FLJ43093 


FZD1 
GALNS 


GALNT6 


GATAD2A 


GLT1D1 


GPA33 
GRM2 


HSD3B7 


LDOCIL 


LIPN 
LMCDI 


LOC100130298 


LOC285908 


41 


TABLE B-continued 


Ensembl gene ID 


ENSG00000105829 
ENSG00000162813 
ENSGO00000007520 
ENSG00000170222 
ENSG00000160679 
ENSG00000157693 
ENSG00000100346 


ENSG00000105879 


ENSG00000100307 
ENSG00000150753 


ENSG00000174807 
ENSG00000164649 


ENSG00000105810 


ENSG00000159261 
ENSG00000169583 


ENSG00000188986 


ENSG00000167193 


ENSG00000159692 


ENSG00000070190 


ENSG00000244274 


ENSG00000180424 
ENSG00000168802 


ENSG00000181192 


ENSG00000205250 


ENSG00000115392 


ENSG00000105248 


ENSG00000255587 


ENSG00000157240 
ENSG00000141012 


ENSG00000139629 


ENSG00000167491 


ENSG00000151948 


ENSG00000143167 
ENSG00000164082 


ENSG00000099377 


ENSG00000188636 


ENSG00000204020 
ENSG00000071282 


ENSG00000258130 


ENSG00000179406 
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David Gene Name Species 
2(BAIAP2) 
Betl golgi vesicular membrane Homo sapiens 


trafficking protein(BET1) 
3'(2'), 5'-bisphosphate 
nucleotidase 1(BPNT1) 
TSR3, Acp Transferase 
Ribosome Maturation Factor 
ADP-Ribose/CDP-Alcohol 
Diphosphatase, Manganese 
Chromatin Target Of PRMT1 
Transmembrane Protein 268 
calcium voltage-gated channel 


Homo sapiens 
Homo sapiens 
Homo sapiens 
Homo sapiens 


Homo sapiens 
Homo sapiens 


subunit alphal I(CACNA1I) 


Cbl proto-oncogene like 
1(CBLL1) 

chromobox 7(CBX7) 
chaperonin containing TCP1 
subunit 5(CCT5) 

D248 molecule(CD248) 
cell division cycle associated 7 Homo sapiens 


C 


li 


C 


cyclin dependent kinase 
6(CDK6) 
audin 14(CLDN14) 


chloride intracellular channel 
3(CLIC3) 

Negative Elongation Factor 
Complex Member B 

CRK proto-oncogene, adaptor 
protein(CRK) 

C-terminal binding protein 
1(CTBP1) 

dual adaptor of 


Homo sapiens 


Homo sapiens 
Homo sapiens 


Homo sapiens 


Homo sapiens 


Homo sapiens 
Homo sapiens 


Homo sapiens 
Homo sapiens 
Homo sapiens 


Homo sapiens 


phosphotyrosine and 3- 
phosphoinositides 1(DAPP1) 


dysbindin domain containing 
2(DBNDD2) 

defensin beta 123(DEFB123) 
Chromosome Transmission 
Fidelity Factor 8 
dehydrogenase El and 
transketolase domain 
containing 1(DHTKD1) 

E2F transcription factor 
4(E2F4) 

Fanconi anemia 
complementation group 
L(FANCL) 

coiled-coil domain containing 
94 

RAB44, Member RAS 
Oncogene Family 


Homo sapiens 


Homo sapiens 
Homo sapiens 


Homo sapiens 


Homo sapiens 


Homo sapiens 


Homo sapiens 


Homo sapiens 


frizzled class receptor 1(FZD1) Homo sapiens 


galactosamine (N-acetyl)-6- 
sulfatase(GALNS) 
polypeptide N- 


Homo sapiens 


Homo sapiens 


acetylgalactosaminyltransferase 


6(GALNT6) 
GATA zine finger domain 


Homo sapiens 


containing 2A(GATAD2A) 


glycosyltransferase 1 domain 


Homo sapiens 


containing 1(GLT1D1) 


glycoprotein A33(GPA33) 
glutamate metabotropic 
receptor 2(GRM2) 
hydroxy-delta-5-steroid 
dehydrogenase, 3 beta- and 
steroid delta-isomerase 
7(HSD3B7) 

eucine zipper down-regulated 
in cancer 1 like(LDOC1L) 
ipase family member N(LIPN) Homo sapiens 
LIM and cysteine rich domains Homo sapiens 
1(LMCD1) 

hCG1816373- 
ike(LOC100130298) 

Long Intergenic Non-Protein 
Coding RNA 174 


Homo sapiens 
Homo sapiens 


Homo sapiens 


Homo sapiens 


Homo sapiens 


Homo sapiens 


42 


Gene Name 


LOC613038 


LOC643905 
LOC652784 
LOC653884 


LOC729338 


LOC731444 
LRP3 


MFNG 


NETO2 
NRCAM 
NTSR2 
NUDTS 
PACSIN2 
PARP3 
PARP8 
PECI 


PLXNC1 
PNPT1 


PPPIR9B 


PSMD12 
QPRT 
RAB3D 
RELA 
RGMB 
RNASET2 


RNF144 
RRP1 


S100A9 


SERTAD3 


SGPL1 


SIGLEC7 


SLC25A19 


SLC38A10 


SODI 


SRPRB 


TAF7 


TCTN3 


TIGD7 


TINF2 


TMEM18 


TMSB4Y 
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TABLE B-continued 


Ensembl gene ID 


ENSG00000258130 


ENSG00000221961 
NA 
NA 
ENSG00000224786 


NA 
ENSG00000130881 


ENSG00000100060 


ENSG00000171208 
ENSG0000009 1129 
ENSG00000169006 


ENSG00000165609 
ENSG00000100266 


ENSG00000041880 
ENSG00000151883 
ENSG00000198721 


ENSG00000136040 
ENSG00000138035 


ENSG00000108819 


ENSG00000197170 


ENSG00000103485 


ENSG00000105514 


ENSG00000173039 


ENSG00000174136 


ENSG00000026297 


ENSG00000151692 


ENSG00000160214 


ENSG00000163220 


ENSG00000167565 


ENSG00000166224 


ENSG00000168995 


ENSG00000125454 


ENSG00000157637 


ENSG00000142168 


ENSG00000144867 


ENSG00000178913 


ENSG00000119977 


ENSG00000140993 


ENSG00000092330 


ENSG00000151353 


ENSG00000154620 


David Gene Name 


SAGA complex associated 
factor 29 
pseudogene(LOC613038) 
Proline Rich 21 

NA 

serine/arginine-rich splicing 
factor 10-like 

Centrin 4, Pseudogene 
(CETN4P) 

NA 

LDL receptor related protein 
3(LRP3) 

MFNG O-fucosylpeptide 3- 
beta-N- 
acetylglucosaminyltransferase 
(MFNG) 

neuropilin and tolloid like 
2(NETO2) 

neuronal cell adhesion 
molecule(NRCAM) 
neurotensin receptor 2(NTSR2) 
nudix hydrolase S(NUDTS) 
protein kinase C and casein 
kinase substrate in neurons 
2(PACSIN2) 
poly(ADP-ribose) polymerase 
family member 3(PARP3) 
poly(ADP-ribose) polymerase 
family member 8(PARP8) 
Enoyl-CoA Delia Isomerase 2 
plexin CI(PLXNCI) 
polyribonucleotide 
nucleotidyltransferase 
1(PNPT1) 

protein phosphatase 1 
regulatory subunit 
9B(PPP1R9B) 

proteasome 26S subunit, non- 
ATPase 12(PSMD12) 
guinolinate 
phosphoribosyltransferase(OPRT) 
RAB3D, member RAS 
oncogene family(RAB3D) 
RELA proto-oncogene, NF-kB 
subunit(RELA) 

repulsive guidance molecule 
family member b(RGMB) 
ribonuclease T2(RNASET2) 
Ring Finger Protein 144A 
ribosomal RNA processing 
1(RRP1) 

S100 calcium binding protein 
A9(S100A9) 

SERTA domain containing 
3(SERTAD3) 
sphingosine-1-phosphate lyase 
1(SGPL1) 

sialic acid binding Ig like lectin 
T(SIGLEC7) 

solute carrier family 25 
member 19(SLC25A19) 
solute carrier family 38 
member 10(SLC38A10) 
superoxide dismutase 1, 
soluble(SOD1) 

SRP receptor beta 
subunit(SRPRB) 

TATA-box binding protein 
associated factor 7(TAF7) 
tectonic family member 
3(TCTN3) 

tigger transposable element 
derived 7(TIGD7) 

TERFI interacting nuclear 
factor 2(TINF2) 
transmembrane protein 
18(TMEMI8) 

thymosin beta 4, Y- 
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TABLE B-continued 


Gene Name Ensembl gene ID David Gene Name 


inked(TMSB4Y) 

tumor necrosis factor 
superfamily member 
8(TNFSF8) 

tripartite motif containing 
7(TRIM7) 

etraspanin 10(TSPAN10) 
vitamin K epoxide reductase 
complex subunit 1 like 
(VKORCIL1) 

vesicle transport through 
interaction with t-SNAREs 
B(VTI1B) 

tyrosine 3- 
monooxygenase/tryptophan 5- 
monooxygenase activation 
protein epsilon(YWHAE) 
ZPRI Zinc Finger 

zinc finger protein 
544(ZNF544) 

zinc finger protein 
583(ZNF583) 

zinc finger protein 
697(ZNF697) 

zinc finger protein 
763(ZNF763) 


TNFSF8 ENSG00000106952 


TRIM7 ENSG00000146054 
TSPANIO 


VKORCILI 


ENSG00000 
ENSG00000 


82612 
96715 


VTIIB ENSG00000100568 


YWHAE ENSG00000 108953 


ZNF259 
ZNF544 


ENSG00000 
ENSG00000 


09917 
98131 


ZNF583 ENSG00000198440 


ZNF697 ENSG00000143067 


ZNF763 ENSG00000197054 


FIG. 10 Venn diagram showing selected gene list overlap. 
A four-way Venn diagram illustrates all unigue, two-way, 
three-way and four-way sets of shared genes. Gene lists 
were selected using the deep transcriptomic aging clocks 
described herein. A set of genes that is common for all 
tissues could be considered as an aging-related universal 
targets that could be used to develop therapies. 

Under the pressure of environmental factors and heredi- 
tary characteristics, the rate of aging naturally varies 
between individuals. As a result, biological age as defined by 
biomarkers often differs between individuals of the same 
chronological age. Biomarkers of biological aging again are 
the objective physiological indicators of tissues and organ 
conditions that are used to assess personal aging rates. Aging 
is ofcourse associated with health risks, inability to maintain 
homeostasis and eventual death prognosis of age-related 
diseases. 

The biomarkers of biological aging as described herein 
can evaluate the effectiveness of anti-aging remedies. This is 
of importance as populations in developed nations through- 
out the world are rapidly aging, and the search and identi- 
fication of efficient anti-aging interventions, has never been 
more essential. 

Because aging is a complex multifactorial process with no 
single cause or treatment (Zhavoronkov 2011; Trindade, 
2013) that affects most if not all tissues and organs of the 
body, the currently available biomarkers in the art do not 
accurately represent the health state ofthe entire organism or 
individual systems, and do not provide accurate and useful 
measures of biological age. Furthermore, several of them are 
not easily measured. Thus biomarkers based on not only 
quantifiable but also easily measurable characteristics are 
still required. 

Usually, identifying and developing biomarkers is a 
multi-steps process that includes proof of concept, experi- 
mental validation and analytical performance validation. 
Nevertheless, alternative approaches based on in silico 
methods can also be used in order to improve and speed up 
the development and validation process of these biomarkers. 
The use of more effective computational approaches for the 
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development of biomarker is favored by two technological 
trends. First of all, the accumulation of high-throughput data 
generated from different research areas such as proteomics, 
genomics, chemoproteomics and phenomics. The second 
technological trend is the progress made in computational 
sciences that, combined with increasingly powerful compu- 
tational resources, allows the development of repurposing 
algorithms but also of software's for retrospective analysis 
as well as the maintenance of web-based databases which 
are required for the gathering and classification of the 
experimental data (Lavecchia, 2016). Using these compu- 
tational resources, various techniques such as Machine 
Learning (ML) are routinely used in biomarker develop- 
ment. 

Although Deep Learning (DL) methods were initially 
developed for dealing with task such as pattern, voice and 
image recognition (Oquab 2014), they can also be used to 
improve the efficiency of in silico techniques applied for 
biomarkers identification. DL-based methods are indeed 
able to overcome many current limitation of more traditional 
in silico techniques. For instance, for integrating biomedical 
data which are complex. The modern DL techniques include 
powerful approaches with deep architecture, called Deep 
Neural Networks (DNNs). Neural Networks are collections 
of neurons (also called units) connected in an acyclic graph. 
Neural Network models are often organized into distinct 
layers of neurons. 

For most neural networks, the most common layer type is 
the fully-connected layer in which neurons between two 
adjacent layers are fully pairwise connected, but neurons 
within a single layer share no connections. One of the main 
features of DNN is that neurons are controlled by non-linear 
activation functions. This non-linearity combined with the 
deep architecture make possible more complex combina- 
tions of the input features leading ultimately to a wider 
understanding of the relationships between them and as a 
result to a more reliable final output. DNNs have already 
been applied for many types of data ranging from structural 
data to chemical descriptors or transcriptomics data (Mayr 
2016, Wang 2014, Ma 2015). Because of this flexibility and 
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adaptability of DNN for learning from large range of data, 
DNNs are now considered as an interesting computational 
approach for tackling many current biomedical related 
issues (Mamoshina 2016, Xu 2015, Hughes 2015). 

Recently, Putin et al. (Putin, 2016) have published prom- 
ising results demonstrating the capacity of DNN-based 
methods to accurately predict biological age and identify a 
set of the most relevant biomarkers for tracking physiologi- 
cal processes related to aging. In their study, the features, a 
set of 41 biomarkers for each sample, used as inputs for the 
DNN were extracted from tens of thousands of blood 
biochemistry samples from patients undergoing routine 
physical examinations. Although being highly variable in 
nature, blood biochemistry test is in practice very simple to 
perform and it is approved for clinical use and as a conse- 
quence, commonly used by Physicians. An effective DNN 
structure was obtained using 56177 samples for the training 
phase (fitting of hyperparameters) with the remaining 6242 
samples used for validation. The interesting results obtained 
for predicting biological age show that DNN-based 
approach outperform many traditional machine learning 
methods including GBM (Gradient Boosting Machine), RF 
(Random Forests), DT (Decision Trees), LR (Linear Regres- 
sion), KNN (k-Nearest Neighbors), ElasticNet, SVM (Sup- 
port Vector Machines). 

Furthermore, PFI (Permutation Features Importance) 
method was used to compute the relative importance of each 
biomarker used to estimate biological age. This information 
can be used in two ways. Firstly, as each biomarker aims at 
measuring a specific biological mechanism, this ranking can 
be exploited to optimize anti-aging strategies by targeting 
the most critical biological processes identified as playing a 
key role in the onset and propagation of aging. Secondly, this 
list can be used to reduce the number of initial inputs 
required to generate accurate prediction of biological age. 
Regarding this second point, the results presented in the 
study show that although each sample initially contains up 
to 46 biomarkers, the performance of DNNs remained 
remarkably stable with an input comprising only the 10 first 
markers with the highest PFI score. Thus, PFI provide a 
ranked list of biomarkers that can be used to select the most 
robust and reliable features for predicting age. 

The growing body of evidence on experimental data on 
life extension of model organisms suggests the feasibility of 
finding interventions promoting human longevity (Moskalev 
A 2017). However, the restricted experimental possibilities 
of studying human aging and overall low translation rate 
from model organisms to the human clinic in other thera- 
peutic areas (Mak, Evaniew, and Ghert 2014) complicates 
the search of desirable anti-aging therapies and only a few 
geroprotectors, anti-aging molecules, shown potential effi- 
cacy in humans (A. Aliper et al. 2016; I. Thomas and Gregg 
2017; A. M. Aliper et al. 2015). 

For the past several decades, research in understanding 
the molecular basis of human aging has progressed signifi- 
cantly. Changes in gene expression are associated with 
numerous biological processes, cellular responses and dis- 
ease states most likely play the crucial role in aging process. 
(de Magalhäes, Curado, and Church 2009). 

Because biological aging is not a single signature, but 
highly specific in terms or organs, tissues, systems, and other 
granular aspects of the organism (including humans), an 
effective and useful biological clock must utilize many 
biomarkers from many tissues and organs. The following are 
some preferred examples. 

Energy Metabolism: 
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Glycolysis, glucose oxidation, fatty acids oxidation are 
main sources of ATP generation, which is crucial for the 
viability of tissue with high-energy demand, such as muscle 
tissue, and especially cardiomyocytes. Aging process trig- 
gers abnormalities in metabolism and energy homeostasis 
(Ma and Li 2015), and aging biomarkers specific to energy 
metabolism are a subject of the present invention. 

Hypertension and Hypoxia: 

Prostaglandins are critical to regulate vasodilation and 
vasoconstriction and to maintain vascular homeostasis. Bal- 
ance of vasodilating and vasoconstricting agents is impor- 
tant to maintain normal vascular function. Aging process 
shift the balance toward a pro-constrictive agents and hyper- 
tension, which is the common vascular complication in 
elderly (Pinto 2007). 

No matter the particular biomarkers being assessed by a 
biological aging assessment compatible with the current 
invention, a preferred embodiment of the deep learning 
computational approach for both the current invention and 
biological aging assessment is as follows. Firstly, a specific 
type of DNN called Deep Feature Selection (DFS) is trained 
on blood gene expression samples using standard back- 
propagation algorithm. Secondly, the DFS model is applied 
to select a set of age-related genes using different DNN- 
based feature selection methods combined into one 
ensemble model via genetic algorithm. 

During the first step, DFS model is trained, for example, 
on 4000 healthy human blood gene expression samples 
extracted from GEO (GSE33828). DFS (Li et al.) is type of 
neural network with several specific characteristics. Firstly, 
DFS adds a particularly hidden layer, called a weighted 
layer, which bridges one to one input features with neurons 
in the weighted layer. After that the neurons in the weighted 
layer are connected one to many with neurons in first normal 
hidden layer of deep feed forward multilayer neural net- 
work. Secondly DFS introduces several regularization 
terms in the neural network loss function. An exemplary 
final loss function expression is as follows: 


t-A 


min f(@) = 10) + A ( 2 + Ao] 
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(k) (k) 
al — zm l- +022, won | 


where 1(0) is the log-likelihood of data, 41, A2, a1 and a2 
are regularization terms. K is the number of hidden layers. 
Iwl]? and ||w]|, stand for the 12 and 11 norm for weight in 
weighted layer, respectively. ||*||- stands for the Frobenius 
norm and ||*||, for the matrix norm. The last two terms are the 
ElasticNet-based terms that control smoothness/sparsity for 
weights of weighted layer. They reduce the model complex- 
ity and speed up the training. After DFS model was trained 
the absolute values of the weights in the weighted layer 
could be used as ranking list for the input features (genes). 

During the second step, DNN-based feature selection 
methods are used to select age-related genes. Each method 
produces a ranked list of relative importance for each gene. 
In addition to the ranking of input features available with the 
DFS model itself, other methods have been applied. This 
includes the permutation feature importance (PFT) method as 
previously described in (Putin et al.), the heuristic variable 
selection (HVS) (Yacoub et al.) and methods based on 
output derivatives. The notable characteristic of these meth- 
ods is that they can be applied to already trained DNNs. It 
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is not necessary to iteratively retrain DNNs as required by 
the forward or backward feature selection methods. 

Heuristic Variable Selection (Yacoub et al.) is a zero first 
order method designed for measuring the relative impor- 
tance of input features of neural network. The method 
requires that the set of weight values and information related 
to the DNN structure as inputs. In a preferred embodiment, 
the relative importance of each given input feature is com- 
puted as follows: 
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where I, H, O are the number of input, hidden and output 
layers, respectively. Note wji denotes the weight between 
neurons j and i. After the training of the DNN and the 
computation of S for each input feature i, the set of S values 
can be assembled as a ranked list. 

There are various of first order methods to measure the 
relative importance of an input feature. These methods used 
either the derivative of the error or the output of the neural 
network with respect to this input feature to establish the 
ranked list. An interesting property of the derivative-based 
methods is that they can be applied to any type of differen- 
tiable h are specific to each derivative-based method. The 
procedure to compute the average relevance of the input 
feature and how the derivative term is included. Here we 
consider the long-studied derivative-based methods 
described in details in (Dorizzi et al.), (Ruck et al.), (Refenes 
et al.), (Czernichow et al.). In the following formulas, 


df (af) 
dx; 


means an output derivative of unit j of the network with 
respect to xi in xl point, Fj(xl) in is an output of the network 
with ul as input, N is the number of samples. If specified, M 
is a number of outputs of the network, var stands for the 
variance, qos or 95% percentile. In the table below the 
relative importance Si of an input feature 1 is presented by 
methods. 

The biological aging assessment uses, as an example: 

1) The model developed by Ruck et al. which is the 
following: 


(2) Refenes et al., have developed three different models: 
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-continued 
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3) The model of Dorizzi et al. takes the following form: 


) 


4) The model of Czernichow et al. is as follows: 


ð 
Si = al Lo 


The final list of ranked genes is obtained by combining the 
different lists described above using simple genetic algo- 
rithm (GA). In a preferred embodiment, the GA proceeds 
according to the following. 

The initial population of genes is initialized by all feature 
ranking lists obtained by applying the aforementioned fea- 
ture selection algorithms on both DNN and DFS models. On 
each iteration the GA performed 35 crossover operations 
between its populations and 15 mutation operations, during 
which random genes were injected in the training of GA. 
Thus at each iteration, 50 DNNs were trained. Convergence 
of the GA was reached after 50 epochs and final gene 
ranking list was obtained. The best DNN model in the GA 
got 0.79 of coefficient of determination and 4.2 mean 
absolute error on validation dataset. On FIG. 3, one can see 
the performance of the DNN for predicting the age of 
healthy individuals (Rsq=0.79). 

Cellular Life Span, Aging, Tissue-Specific Age Predic- 
tion, thus, biological aging assessment compatible with the 
current invention. 

As discussed above, different cell and tissues exhibit 
different expression patterns, different aging patterns, and 
different life-spans. This substantial variation means that it 
is useful to have aging clocks that are specific to different 
cells, tissues, and organs (Seim, Ma, and Gladyshev 2016). 
In a preferred embodiment we utilize DNN-based predictors 
of age trained on 12 tissues and 4 tissue-specific DNN-based 
predictors of age trained on gene expression profiles of a 
mononuclear whole blood fraction. 

Despite the fact that universal 12-tissues based predictor 
is trained at the data set with a larger sample size compared 
to 4 tissues specific deep aging clocks, its prediction per- 
formance is significantly worse (11.2 years for best network 
compared to 6.4, 8.2, 7.8 and 8.3 years for Blood, Brain, 
Liver and M. Blood-based predictors respectively). 

In a preferred embodiment we utilize a DES algorithm for 
feature ranking to identify the most important genes in age 
prediction on the universal 12-tissues based predictor of age 
as well the 4 tissues specific predictors of age. 

In an implementation of the method a universal 12-tissues 
based predictor is trained on a data set with a larger sample 
size compared to 4 tissues specific deep aging clocks, its 
prediction performance is significantly worse (11.2 years for 
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best network compared to 6.4, 8.2, 7.8 and 8.3 years for 
Blood, Brain, Liver and M. Blood based predictors, respec- 
tively). 

Data from up to 51,139 samples profiled on a GLP570 
microarray platform was used to train and test our DNNs. 
The GLP570 GEO accession numbers refers to data gener- 
ated using the common Affymetrix Human Genome U133 
Plus 2.0 Array, which covers approximately 47,000 tran- 
scripts, although only 12,328 or 12,428 transcripts were 
used in the study. Data was split into training and test sets 
with a 90:10 ratio with exact values shown in each results 
section. 

Following on from the successful and highly accurate 
usage of our DNN to classify sex we then attempted to 
predict classify based on age of samples. As discussed 
previously we approached age prediction as a regression- 
based problem. In a preferred embodiment, 12,328 genes 
over a total of 20,766 samples were used, 18,261 samples 
were used to train and 2,505 samples used to test. Our 
DNN-based age predictor delivered a MAE of 11.46 years, 
a significant improvement over standard machine learning 
models, with k-NN coming closest to matching the DNN 
with a MAE of 14.973 years. A very small increase (0.085) 
in MAE was observed following DFS for the 1,000 most 
relevant genes suggesting that there was little extra training 
capacity in the DNN using selected gene expression dataset. 

Since we saw a clear ability to distinguish tissues by our 
DNN we investigated if the MAE ofthe age predictor, would 
change when investigating tissue specific aging. In a pre- 
ferred embodiment, 12,428 genes were analyzed from 1,853 
samples from whole blood (1,733 train, 120 test), 372 from 
brain (278 train, 49 test), 287 from liver (228 train, 47 test) 
and 267 mononuclear blood fractions (170 train, 97 test); 
again using a regression based model. Remarkably, in all 
cases a significant improvement over the MAE of our 
general DNN-based age predictor was observed, with whole 
blood performing especially well generating a MAE of 
6.696. Further improvements were seen following DFS, 
with a particularly large decrease in MAE observed in brain 
samples (10.788 vs 8.209). In all instances the various DNN 
outperformed RF, k-NN and LR models often producing an 
MAE more than 50% smaller. In total, these observations 
suggest that the transcriptomic aging-clock is regulated in a 
tissue specific manner. 

Multilayer (with 3 or 4 hidden layers) feed-forward neural 
networks with a standard backpropagation algorithm were 
used in a preferred embodiment. A Python implementation 
of the Keras library with Theano backend was used to build 
and train neural networks and Scikit-learn library to build 
and train random forest (RF), K-nearest neighbor (k-NN) 
and linear regression (LR) models. Grid search algorithm 
was used for hyperparameter optimization in order to 
achieve the greatest predictive accuracy. 

After rounds of optimization, Adam optimizer with Nes- 
terov momentum and learning rate of 0.01 was selected for 
all models. Rectified linear unit (ReLU) either exponential 
linear unit (ELU) were selected as activation functions. 
Mean absolute error (MAE) loss function was used in a 
regression task of age prediction. For regularization pur- 
poses models were trained with a dropout with 20-5096 
probability after each layer. Performance of the best DNNs 
were compared to best (with optimized hyperparameters) RF 
and k-NN algorithms where appropriate. For the purposes of 
this study we treated the prediction of human age as a 
regression based problem as previously discussed (Putin E 
2017) therefore age related experiments are also compared 
against a LR model. All experiments were conducted with 
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5-fold cross validation by drugs on NVIDIA GTC Titan 
Pascal with 128 Gb of RAM. 

The biological aging clocks as disclosed in the current 
invention are, not surprisingly, useful and compatible with 
senescence treatments. The following is such an example. 

Recent paper by Petkovich et al, covers the application of 
epigenetic clocks to evaluate the effectiveness of anti-aging 
interventions such as caloric restriction and genetic inter- 
ventions that are known to increase lifespan (Growth hor- 
mone knockout and Snell dwarf mice) (Petkovich et. al 
2017). Firstly, authors developed epigenetic aging clocks 
and predicted the age of animals on interventions and 
matching controls. Mouse on caloric restriction demon- 
strates the decrease in predicted age compared to actual 
chronological and compares to the age-matching controls. 
Snell dwarf mouse demonstrate the greater decrease in the 
predicted age comparing to the matching controls. Growth 
hormone knockout also demonstrate younger predicted bio- 
logical age. 

The same suppression age-associated DNA methylation 
changes was shown for not only for genetic, dietary inter- 
ventions but also for rapamycin, mTORCI and mTORC2 
inhibitor, that promote healthy aging and extend lifespan 
(Cole et al. 2017). 

Combined inhibition of both mTORC1 and mTORC2 also 
may provide a promising strategy to reverse the develop- 
ment of senescence-associated features in near-senescent 
cells (Walters, Deneka-Hannemann, and Cox 2016). 

In order to rescue the cells demonstrating pre-senescent 
phenotype the specific set of possible interventions shall be 
applied. These interventions include the treatment with the 
one senoremediator compound or a combination of the 
senoremediator compounds from the list below. 

Activators of PI3K: Insulin receptor substrate (Tyr608) 
peptide, the sequence is established and known in the art, is 
from insulin receptor substrate-1 (IRS-1) inclusive of 
Tyr608 (mouse)—Tyr612 (human). It contains the insulin 
receptor tyrosine kinase substrate motif YMXM (Tyr-Met- 
X-Met). This peptide has been used as a substrate for 
purified insulin receptor (Km-90 uM) and other tyrosine 
kinases in phosphocellulose binding assays. The tyrosine 
phosphorylated version of this peptide binds to phosphati- 
dylinositol 3-kinase (PI 3-kinase) SH2 domain and activates 
the enzyme. 

740 Y-P: cell-permeable phosphopeptide activator of 
PI3K. The PDGFR 740Y-P peptide stimulates a mitogenic 
response in muscle cells. The ability of the 740Y-P peptide 
to stimulate mitogenesis is highly specific and not a general 
feature of a cell permeable SH2 domain binding peptides. 
See ncbi.nlm.nih.gov/pubmed/9790922. 

mTORC1, mTORC2 inhibitors: sapanisertib (Wise- 
Draper et al. 2017; Moore et al. 2018), dactolisib (Wise- 
Draper et al. 2017). 

Inhibitors of PDH: GSK2334470 (GlaxoSmithKline), 
MP7 (Merck). (Emmanouilidi and Falasca 2017). 

Compounds found based on transcriptional signature 
analysis according to the procedure described in example 1: 
Withaferin A, Lavendustin A, Sulforaphane. Senoremediator 
compounds can be administered orally, by injection, sublin- 
gually, buccally, rectally, vaginally, cutaneously, transder- 
mally, ocularly, oticly or nasally or other method. 


Example 2 
Analysis of Age Predictor Outputs 


FIG. 11 illustrates the delta (difference between assigned 
(predicted) biological age and actual chronological age) bar 
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plots grouped by age ranges for healthy people based on an 
exemplary validation set as described. Delta demonstrates 
disagreement between the chronological age and the pre- 
dicted age. The larger the delta value the large is the 
disagreement between age values predicted by the model 
and actual chronological age of individuals. In case of 
diseases patients, unhealthy aged patients, patients on treat- 
ment, the predicted age may significantly differ from their 
actual chronological age. 

Gene expression profiles were collected from the publicly 
available repositories Gene Expression Omnibus (ncbi.nlm- 
.nih.gov/geo/) and ArrayExpress (ebi.ac.uk/arrayexpress/). 
Here we present the case studies and example ofthe analysis 
of age predictor outputs. Such age predictors can also be 
used to study age acceleration caused by hazardous envi- 
ronmental exposures or diseases. We analyzed 2 datasets 
GSE10846, E-MTAB-4015. 

We first analyzed the GSE10846 dataset containing the 
survival, treatment information and gene expression data for 
412 patients with diffuse large B cell lymphoma (e.g., 
disease analysis) and treated with chemotherapy or chemo- 
therapy plus Rituximab. Being predicted by the model 
younger chronological age is associated with good prognos- 
tic. 

Patients that were found to have an older transcriptomic- 
age (e.g., age predicted by the model) than their chrono- 
logical age had increased risk of dying and vice versa. A 
younger blood age could, therefore, be a useful outcome 
measure in interventions for healthy aging. 

FIG. 12 shows an example of a biological age clock, or a 
report thereof. To investigate the predictive ability of deep 
transcriptomic aging clocks (e.g., biological aging clock) on 
mortality, we employed chronological age- and sex-adjusted 
Cox regression models. Samples predicted to be younger 
than actual age consistently demonstrated a decrease in the 
hazard ratio (3396), while samples that predicted to be older 
than actual age demonstrated a significant increase in the 
hazard ratio (1296). Thus, the hazard ratio can be used in the 
methods of the present invention. 

Analysis of the E-MTAB-4015 dataset of smoking status 
and health status (e.g., lifestyle analysis) and gene expres- 
sion data for 211 individuals with Chronic Obstructive 
Pulmonary Disease (COPD) and without COPD. Tobacco 
smoking, creates a significant strain on healthcare systems 
worldwide, as it is a major risk factor for a host of chronic 
diseases and a potential culprit in premature aging and 
mortality. 

FIG. 13 shows an example of a biological age clock, or a 
report thereof. The actual and predicted age for current 
smokers, non-smokers former smokers and individuals with 
COPD is shown. Non-smokers demonstrated a lower pre- 
dicted age compared to the current and former smokers and 
to COPD. Mean predicted age of nonsmokers is 60 years, 
compared to the mean of 63 years for current smokers and 
63 for COPD individuals (p-value<0.05). 

For this and other processes and methods disclosed 
herein, the operations performed in the processes and meth- 
ods may be implemented in differing order. Furthermore, the 
outlined operations are only provided as examples, and 
some operations may be optional, combined into fewer 
operations, eliminated, supplemented with further opera- 
tions, or expanded into additional operations, without 
detracting from the essence of the disclosed embodiments. 

The present disclosure is not to be limited in terms of the 
particular embodiments described in this application, which 
are intended as illustrations of various aspects. Many modi- 
fications and variations can be made without departing from 
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its spirit and scope. Functionally equivalent methods and 
apparatuses within the scope of the disclosure, in addition to 
those enumerated herein, are possible from the foregoing 
descriptions. Such modifications and variations are intended 
to fall within the scope of the appended claims. The present 
disclosure is to be limited only by the terms of the appended 
claims, along with the full scope of equivalents to which 
such claims are entitled. The terminology used herein is for 
the purpose of describing particular embodiments only, and 
is not intended to be limiting. 

In one embodiment, the present methods can include 
aspects performed on a computing system. As such, the 
computing system can include a memory device that has the 
computer-executable instructions for performing the meth- 
ods. The computer-executable instructions can be part of a 
computer program product that includes one or more algo- 
rithms for performing any of the methods of any of the 
claims. 

In one embodiment, any of the operations, processes, or 
methods, described herein can be performed or cause to be 
performed in response to execution of computer-readable 
instructions stored on a computer-readable medium and 
executable by one or more processors. The computer-read- 
able instructions can be executed by a processor of a wide 
range of computing systems from desktop computing sys- 
tems, portable computing systems, tablet computing sys- 
tems, hand-held computing systems, as well as network 
elements, and/or any other computing device. The computer 
readable medium is not transitory. The computer readable 
medium is a physical medium having the computer-readable 
instructions stored therein so as to be physically readable 
from the physical medium by the computer/processor. 

There are various vehicles by which processes and/or 
systems and/or other technologies described herein can be 
effected (e.g., hardware, software, and/or firmware), and that 
the preferred vehicle may vary with the context in which the 
processes and/or systems and/or other technologies are 
deployed. For example, if an implementer determines that 
speed and accuracy are paramount, the implementer may opt 
for a mainly hardware and/or firmware vehicle; if flexibility 
is paramount, the implementer may opt for a mainly soft- 
ware implementation; or, yet again alternatively, the imple- 
menter may opt for some combination of hardware, soft- 
ware, and/or firmware. 

The various operations described herein can be imple- 
mented, individually and/or collectively, by a wide range of 
hardware, software, firmware, or virtually any combination 
thereof. In one embodiment, several portions of the subject 
matter described herein may be implemented via application 
specific integrated circuits (ASICs), field programmable 
gate arrays (FPGAs), digital signal processors (DSPs), or 
other integrated formats. However, some aspects of the 
embodiments disclosed herein, in whole or in part, can be 
equivalently implemented in integrated circuits, as one or 
more computer programs running on one or more computers 
(e.g., as one or more programs running on one or more 
computer systems), as one or more programs running on one 
or more processors (e.g., as one or more programs running 
on one or more microprocessors), as firmware, or as virtu- 
ally any combination thereof, and that designing the cir- 
cuitry and/or writing the code for the software and/or 
firmware are possible in light of this disclosure. In addition, 
the mechanisms of the subject matter described herein are 
capable of being distributed as a program product in a 
variety of forms, and that an illustrative embodiment of the 
subject matter described herein applies regardless of the 
particular type of signal bearing medium used to actually 
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carry out the distribution. Examples of a physical signal 
bearing medium include, but are not limited to, the follow- 
ing: a recordable type medium such as a floppy disk, a hard 
disk drive (HDD), a compact disc (CD), a digital versatile 
disc (DVD), a digital tape, a computer memory, or any other 
physical medium that is not transitory or a transmission. 
Examples of physical media having computer-readable 
instructions omit transitory or transmission type media such 
as a digital and/or an analog communication medium (e.g., 
a fiber optic cable, a waveguide, a wired communication 
link, a wireless communication link, etc.). 

It is common to describe devices and/or processes in the 
fashion set forth herein, and thereafter use engineering 
practices to integrate such described devices and/or pro- 
cesses into data processing systems. That is, at least a 
portion of the devices and/or processes described herein can 
be integrated into a data processing system via a reasonable 
amount of experimentation. À typical data processing sys- 
tem generally includes one or more of a system unit housing, 
a video display device, a memory such as volatile and 
non-volatile memory, processors such as microprocessors 
and digital signal processors, computational entities such as 
operating systems, drivers, graphical user interfaces, and 
applications programs, one or more interaction devices, such 
as a touch pad or screen, and/or control systems, including 
feedback loops and control motors (e.g., feedback for sens- 
ing position and/or velocity; control motors for moving 
and/or adjusting components and/or quantities). A typical 
data processing system may be implemented utilizing any 
suitable commercially available components, such as those 
generally found in data computing/communication and/or 
network computing/communication systems. 

The herein described subject matter sometimes illustrates 
different components contained within, or connected with, 
different other components. Such depicted architectures are 
merely exemplary, and that in fact, many other architectures 
can be implemented which achieve the same functionality. 
In a conceptual sense, any arrangement of components to 
achieve the same functionality is effectively “associated” 
such that the desired functionality is achieved. Hence, any 
two components herein combined to achieve a particular 
functionality can be seen as “associated with" each other 
such that the desired functionality is achieved, irrespective 
of architectures or intermedial components. Likewise, any 
two components so associated can also be viewed as being 
“operably connected", or “operably coupled", to each other 
to achieve the desired functionality, and any two compo- 
nents capable of being so associated can also be viewed as 
being “operably couplable", to each other to achieve the 
desired functionality. Specific examples of operably cou- 
plable include, but are not limited to: physically mateable 
and/or physically interacting components and/or wirelessly 
interactable and/or wirelessly interacting components and/or 
logically interacting and/or logically interactable compo- 
nents. 

FIG. 14 shows an example computing device 600 (e.g., a 
computer) that may be arranged in some embodiments to 
perform the methods (or portions thereof) described herein. 
In a very basic configuration 602, computing device 600 
generally includes one or more processors 604 and a system 
memory 606. A memory bus 608 may be used for commu- 
nicating between processor 604 and system memory 606. 

Depending on the desired configuration, processor 604 
may be of any type including, but not limited to: a micro- 
processor (uP), a microcontroller (uC), a digital signal 
processor (DSP), or any combination thereof. Processor 604 
may include one or more levels of caching, such as a level 
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one cache 610 and a level two cache 612, a processor core 
614, and registers 616. An example processor core 614 may 
include an arithmetic logic unit (ALU), a floating point unit 
(FPU), a digital signal processing core (DSP Core), or any 
combination thereof. An example memory controller 618 
may also be used with processor 604, or in some implemen- 
tations, memory controller 618 may be an internal part of 
processor 604. 

Depending on the desired configuration, system memory 
606 may be of any type including, but not limited to: volatile 
memory (such as RAM), non-volatile memory (such as 
ROM, flash memory, etc.), or any combination thereof. 
System memory 606 may include an operating system 620, 
one or more applications 622, and program data 624. Appli- 
cation 622 may include a determination application 626 that 
is arranged to perform the operations as described herein, 
including those described with respect to methods described 
herein. The determination application 626 can obtain data, 
such as pressure, flow rate, and/or temperature, and then 
determine a change to the system to change the pressure, 
flow rate, and/or temperature. 

Computing device 600 may have additional features or 
functionality, and additional interfaces to facilitate commu- 
nications between basic configuration 602 and any required 
devices and interfaces. For example, a bus/interface con- 
troller 630 may be used to facilitate communications 
between basic configuration 602 and one or more data 
storage devices 632 via a storage interface bus 634. Data 
storage devices 632 may be removable storage devices 636, 
non-removable storage devices 638, or a combination 
thereof. Examples of removable storage and non-removable 
storage devices include: magnetic disk devices such as 
flexible disk drives and hard-disk drives (HDD), optical disk 
drives such as compact disk (CD) drives or digital versatile 
disk (DVD) drives, solid state drives (SSD), and tape drives 
to name a few. Example computer storage media may 
include: volatile and non-volatile, removable and non-re- 
movable media implemented in any method or technology 
for storage of information, such as computer readable 
instructions, data structures, program modules, or other data. 

System memory 606, removable storage devices 636 and 
non-removable storage devices 638 are examples of com- 
puter storage media. Computer storage media includes, but 
is not limited to: RAM, ROM, EEPROM, flash memory or 
other memory technology, CD-ROM, digital versatile disks 
(DVD) or other optical storage, magnetic cassettes, mag- 
netic tape, magnetic disk storage or other magnetic storage 
devices, or any other medium which may be used to store the 
desired information and which may be accessed by com- 
puting device 600. Any such computer storage media may be 
part of computing device 600. 

Computing device 600 may also include an interface bus 
640 for facilitating communication from various interface 
devices (e.g., output devices 642, peripheral interfaces 644, 
and communication devices 646) to basic configuration 602 
via bus/interface controller 630. Example output devices 
642 include a graphics processing unit 648 and an audio 
processing unit 650, which may be configured to commu- 
nicate to various external devices such as a display or 
speakers via one or more A/N ports 652. Example peripheral 
interfaces 644 include a serial interface controller 654 or a 
parallel interface controller 656, which may be configured to 
communicate with external devices such as input devices 
(e.g., keyboard, mouse, pen, voice input device, touch input 
device, etc.) or other peripheral devices (e.g., printer, scan- 
ner, etc.) via one or more I/O ports 658. An example 
communication device 646 includes a network controller 
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660, which may be arranged to facilitate communications 
with one or more other computing devices 662 over a 
network communication link via one or more communica- 
tion ports 664. 

The network communication link may be one example of 
a communication media. Communication media may gen- 
erally be embodied by computer readable instructions, data 
structures, program modules, or other data in a modulated 
data signal, such as a carrier wave or other transport mecha- 
nism, and may include any information delivery media. A 
“modulated data signal” may be a signal that has one or 
more of its characteristics set or changed in such a manner 
as to encode information in the signal. By way of example, 
and not limitation, communication media may include wired 
media such as a wired network or direct-wired connection, 
and wireless media such as acoustic, radio frequency (RF), 
microwave, infrared (IR), and other wireless media. The 
term computer readable media as used herein may include 
both storage media and communication media. 

Computing device 600 may be implemented as a portion 
ofa small-form factor portable (or mobile) electronic device 
such as a cell phone, a personal data assistant (PDA), a 
personal media player device, a wireless web-watch device, 
a personal headset device, an application specific device, or 
a hybrid device that includes any of the above functions. 
Computing device 600 may also be implemented as a 
personal computer including both laptop computer and 
non-laptop computer configurations. The computing device 
600 can also be any type of network computing device. The 
computing device 600 can also be an automated system as 
described herein. 

The embodiments described herein may include the use of 
a special purpose or general-purpose computer including 
various computer hardware or software modules. 

Embodiments within the scope of the present invention 
also include computer-readable media for carrying or having 
computer-executable instructions or data structures stored 
thereon. Such computer-readable media can be any available 
media that can be accessed by a general purpose or special 
purpose computer. By way of example, and not limitation, 
such computer-readable media can comprise RAM, ROM, 
EEPROM, CD-ROM or other optical disk storage, magnetic 
disk storage or other magnetic storage devices, or any other 
medium which can be used to carry or store desired program 
code means in the form of computer-executable instructions 
or data structures and which can be accessed by a general 
purpose or special purpose computer. When information is 
transferred or provided over a network or another commu- 
nications connection (either hardwired, wireless, or a com- 
bination of hardwired or wireless) to a computer, the com- 
puter properly views the connection as a computer-readable 
medium. Thus, any such connection is properly termed a 
computer-readable medium. Combinations of the above 
should also be included within the scope of computer- 
readable media. 

Computer-executable instructions comprise, for example, 
instructions and data which cause a general purpose com- 
puter, special purpose computer, or special purpose process- 
ing device to perform a certain function or group of func- 
tions. Although the subject matter has been described in 
language specific to structural features and/or methodologi- 
cal acts, it is to be understood that the subject matter defined 
in the appended claims is not necessarily limited to the 
specific features or acts described above. Rather, the specific 
features and acts described above are disclosed as example 
forms of implementing the claims. 


20 


35 


40 


45 


60 


65 


S8 


With respect to the use of substantially any plural and/or 
singular terms herein, those having skill in the art can 
translate from the plural to the singular and/or from the 
singular to the plural as is appropriate to the context and/or 
application. The various singular/plural permutations may 
be expressly set forth herein for sake of clarity. 

It will be understood by those within the art that, in 
general, terms used herein, and especially in the appended 
claims (e.g., bodies of the appended claims) are generally 
intended as “open” terms (e.g., the term “including” should 
be interpreted as “including but not limited to," the term 
“having” should be interpreted as “having at least,” the term 
“includes” should be interpreted as “includes but is not 
limited to," etc.). It will be further understood by those 
within the art that if a specific number of an introduced claim 
recitation is intended, such an intent will be explicitly recited 
in the claim, and in the absence of such recitation, no such 
intent is present. For example, as an aid to understanding, 
the following appended claims may contain usage of the 
introductory phrases “at least one" and “one or more" to 
introduce claim recitations. However, the use of such 
phrases should not be construed to imply that the introduc- 
tion ofa claim recitation by the indefinite articles “a” or “an” 
limits any particular claim containing such introduced claim 
recitation to embodiments containing only one such recita- 
tion, even when the same claim includes the introductory 
phrases “one or more" or “at least one” and indefinite 
articles such as “a” or “an” (e.g., “a” and/or “an” should be 
interpreted to mean “at least one” or “one or more"); the 
same holds true for the use of definite articles used to 
introduce claim recitations. In addition, even if a specific 
number of an introduced claim recitation is explicitly 
recited, those skilled in the art will recognize that such 
recitation should be interpreted to mean at least the recited 
number (e.g., the bare recitation of “two recitations," with- 
out other modifiers, means at least two recitations, or two or 
more recitations). Furthermore, in those instances where a 
convention analogous to “at least one of A, B, and C, etc." 
is used, in general, such a construction is intended in the 
sense one having skill in the art would understand the 
convention (e.g., “a system having at least one of A, B, and 
C" would include but not be limited to systems that have A 
alone, B alone, C alone, A and B together, A and C together, 
B and C together, and/or A, B, and C together, etc.). It will 
be further understood by those within the art that virtually 
any disjunctive word and/or phrase presenting two or more 
alternative terms, whether in the description, claims, or 
drawings, should be understood to contemplate the possi- 
bilities of including one of the terms, either of the terms, or 
both terms. For example, the phrase *A or B" will be 
understood to include the possibilities of “A” or “B” or “A 
and B." 

In addition, where features or aspects of the disclosure are 
described in terms of Markush groups, those skilled in the 
art wil recognize that the disclosure is also thereby 
described in terms of any individual member or subgroup of 
members of the Markush group. 

As will be understood by one skilled in the art, for any and 
all purposes, such as in terms of providing a written descrip- 
tion, all ranges disclosed herein also encompass any and all 
possible subranges and combinations of subranges thereof. 
Any listed range can be easily recognized as sufficiently 
describing and enabling the same range being broken down 
into at least equal halves, thirds, quarters, fifths, tenths, etc. 
As a non-limiting example, each range discussed herein can 
be readily broken down into a lower third, middle third and 
upper third, etc. As will also be understood by one skilled in 
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the art all language such as “up to,” “at least,” and the like 
include the number recited and refer to ranges which can be 
subsequently broken down into subranges as discussed 
above. Finally, as will be understood by one skilled in the 
art, a range includes each individual member. Thus, for 
example, a group having 1-3 cells refers to groups having 1, 
2, or 3 cells. Similarly, a group having 1-5 cells refers to 
groups having 1, 2, 3, 4, or 5 cells, and so forth. 

From the foregoing, it will be appreciated that various 
embodiments of the present disclosure have been described 
herein for purposes of illustration, and that various modifi- 
cations may be made without departing from the scope and 
spirit of the present disclosure. Accordingly, the various 
embodiments disclosed herein are not intended to be limit- 
ing, with the true scope and spirit being indicated by the 
following claims. 

Definitions: 

A “biopsy” is a medical test involving extraction of 
sample cells or tissues for examination, and can be analyzed 
chemically. When only a sample of tissue is removed with 
preservation of the histological architecture of the tissue’s 
cells, the procedure is called an incisional biopsy or core 
biopsy. When a sample of tissue or fluid is removed with a 
needle in such a way that cells are removed without pre- 
serving the histological architecture of the tissue cells, the 
procedure is called a needle aspiration biopsy. 

“Senescence” is biological aging, that is, the gradual 
deterioration of function and ability in almost all life forms, 
mostly after maturation and in particular multi-cellular life. 
Senescence increases mortality. Senescence refer to cellular 
senescence, tissue senescence, organ senescence, and senes- 
cence of the whole organism. Cellular senescence largely 
underlies organismal senescence. The boundary between 
disease and senescence as organisms, tissues, and cells, may 
have characteristics of both, as disease and senescence are 
often associated with each other. 

“Cellular senescence” is not the aging of an individual 
cell, but instead, the state (gene expression) of a cell with 
respect to the senescence of its tissue or organism, in 
comparison to a less senescent tissue or organism. Cell 
senescence may partly be the result of telomere shortening 
cells, which may trigger a DNA damage response. Cells can 
also be induced to senesce via DNA damage in response to 
elevated reactive oxygen species, activation of oncogenes, 
cell-to-cell fusion, and other causes. As such, cellular senes- 
cence represents a change in “cell state” rather than a cell 
becoming “aged” The number of senescent cells in tissues 
rises substantially during normal aging. Cells may also 
experience “replicative senescence”, in which they can no 
longer divide. There is a “senescence associated secretory 
phenotype” (SASP) associated with senescent cells, which is 
associated with, for example, an increase in inflammatory 
cytokines, growth factors, and proteases. Cellular senes- 
cence contributes to age-related diseases, such as athero- 
sclerosis. 

“Fibrosis” is the accumulation of excess fibrous connec- 
tive cells or other similarly stiff, structural cells, called 
“fibrotic cells” in an organ or tissue. Such fibrosis can be a 
normal, functional part of the reparative process (such as 
scarring) but can also be pathological. Excess and unneces- 
sary fibrosis is associated with senescence, typically 
decrease flexibility and other function of a tissue or organ. 
Fibrotic cells generally have an excess of extracellular 
matrix proteins which contribute to their stiffness. 

A “senolytic” is a drug of other treatment that can 
selectively induce death of senescent cells. 
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A “senoremediator” is a drug of other treatment that can 
restore or increase the number of presenescent or nonsenes- 
cent cells. 

“Machine learning” (ML) is a subfield of computer sci- 
ence that gives computers the ability to learn without being 
explicitly programmed. Machine learning platforms include, 
but are not limited to naive bayes classifiers, support vector 
machines, decision trees, and neural networks. 

“Artificial neural networks”, also called “ANNs” or just 
“neural networks”, are based on a large collection of con- 
nected simple units called artificial neurons loosely analo- 
gous to axons in a biological brain. If the combined incom- 
ing signals are strong enough, the neuron becomes activated 
and the signal travels to other neurons connected to it. The 
activation function of such neurons is often, though not 
always, represented as a sigmoid function. 

“Deep learning” (DL) (also known as deep structured 
learning, hierarchical learning or deep machine learning) is 
the study of artificial neural networks that contain more than 
one hidden layer of neurons. Such a neural network is called 
a “deep neural network”. A “convolutional neural network” 
is a type of neural network in which the connectivity pattern 
is inspired by the organization of the animal visual cortex. 

“Principal component analysis” (PCA) is a statistical 
procedure that uses an orthogonal transformation to convert 
a set of observations of variables into a set of values of 
linearly uncorrelated variables called principal components. 
The transformation is defined in such a way that the first 
principal component has the largest possible variance and 
each succeeding component in turn has the highest variance 
possible under the constraint that it is orthogonal to the 
preceding components. 

“Generative adversarial networks" (GANs) are neural 
networks that are trained in an adversarial manner to gen- 
erate data mimicking some distribution. A discriminative 
model is a model that discriminates between two (or more) 
different classes of data, for example a convolutional neural 
network that is trained to output 1 given an image of a 
human face and 0 otherwise. A generative model by contrast 
generates new data which fits the distribution of the training 
data. GANs are well known in the art, as described, for 
example, in (2) Goodfellow et. al., “Generative Adversarial 
Networks", arXiv:1406.2661v1, 2014. 

An “autoencoder” is a neural network architecture gen- 
erally used for unsupervised learning of efficient coding. An 
autoencoder learn representations (encodings) for a set of 
data, often for the purpose of dimensionality reduction. An 
“adversarial autoencoder" (AAE), is an autoencoder that 
uses generative adversarial networks (GAN) to perform 
variational inference by matching the aggregated posterior 
of the hidden code vector of the autoencoder with an 
arbitrary prior distribution. AAEs are well known in the art, 
as described, for example, in Makhzani et. al., “Adversarial 
Autoencoders", arXiv:1511.05644v2, 2015. Application of 
AAEs to new molecule development such as drugs is also 
well-known in the art, as described, for example, in Kadurin, 
et. al., “The cornucopia of meaningful leads: Applying deep 
adversarial autoencoders for new molecule development in 
oncology", Oncotarget, 2017, Vol. 8, (No. 7), pp: 10883- 
10890. 

Feature importance is a statistical method to evaluate the 
importance of input features for the prediction of the output 
target. Mainly feature importance methods are including but 
not limited to the ensemble based wrapper methods called 
Permutation Features Importance (PFI). First, a model is 
train on the feature set, then a vector of feature of interest 
randomly shuffled and used for training the same model. 
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Then a score of before and after randomly shuffling model 
compared and a relative importance score is assigned to the 
vector of interest. 

Deep feature selection (DFS) is a method proposed in 
2016 by Wasserman et al. (Deep Feature Selection: Theory 
and Application to Identify Enhancers and Promoters. (Li 
Y1, Chen C Y, Wasserman W W, J Comput Biol. 2016 May; 
23(5):322-36. doi: 10.1089/cmb.2015.0189. Epub 2016 Jan. 
22). Method is based on the deep neural network that can 
select features at the input layer of the neural network. 

Support Vector Machine is a discriminative classifier that 
given labeled training datathe algorithm outputs an optimal 
hyperplane which categorizes new data points/examples. 

All references recited herein and/or recited in the provi- 
sional applications 62/536,658 filed Jul. 25, 2017 and/or 
62/547,061 filed Aug. 17, 2017 are incorporated herein by 
specific reference in their entirety. 
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The invention claimed is: 

1. A method of creating a biological aging clock for a 

subject, the method comprising: 

(a) receiving a transcriptome signature derived from a 
tissue or organ of the subject; 

(b) creating input vectors based on the transcriptome 
signature; 

(c) inputting the input vectors into a machine learning 
platform; 

(d) generating a predicted biological aging clock of the 
tissue or organ based on the input vectors by the 
machine learning platform, wherein the biological 
aging clock is specific to the tissue or organ; and 

(e) preparing a report that includes the biological aging 
clock that identifies a predicted biological age of the 
tissue or organ. 

2. The method of claim 1, further comprising: 

creating at least a second biological aging clock by 
repeating any one or more of steps (a), (b), (c), and/or 
(d), wherein the second biological aging clock is based 
on a transcriptome from the tissue or organ of the 
subject, a different tissue or organ of the subject or a 
tissue or organ of a second subject; and 

optionally, preparing a report that includes the second 
biological aging clock that identifies a second predicted 
biological age of the tissue or organ of the subject, a 
different tissue or organ of the subject or a tissue or 
organ of a second subject. 

3. The method of claim 2, further comprising: 

combining the biological aging cock with the second 
biological aging clock to create a synthetic biological 
aging clock, wherein the synthetic biological aging 
clock provides a synthetic biological age of the tissue, 
organ, or of the subject; and 

optionally, preparing a report that includes the synthetic 
biological aging clock that identifies the synthetic bio- 
logical age of the tissue, organ, or of the subject. 

4. The method of claim 3, further comprising one or more 

of: 

comparing the predicted biological age of the tissue or 
organ with the actual age of the subject; 

comparing the second predicted biological age of the 
tissue or organ with the actual age of the subject; 

comparing the synthetic biological age of the tissue or 
organ and with the actual age of the subject, 

wherein the method further comprises: 

preparing a report with the comparing and with a differ- 
ence from the actual age of the subject. 

5. The method of claim 1, wherein the report includes one 

or more of: 
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a therapeutic regimen based on the predicted biological 

age in view of an actual age of the subject; 

a diet regimen based on the predicted biological age in 

view of an actual age of the subject; 

a questionnaire about lifestyle habits; 

a prognosis of the life expectancy with and/or without the 

therapeutic regimen; 

a prognosis of the life expectancy with and/or without the 

diet regimen; 

a prognosis of the probability of survival of patient during 

the therapeutic regimen; or 

a prognosis of the probability of survival of patient during 

the diet regimen. 

6. The method of claim 1, wherein the tissue or organ are: 

diseased; 

healthy; 

determined as susceptible to disease; 

undergoing senescence; 

in pre-senescence; or 

non-senescent. 

7. The method of claim 5, wherein the therapeutic regi- 
men includes one or more of: 

applying a senoremediation drug treatment protocol to the 

subject in order to rescue one or more first cells in the 
subject; 

applying a senolytic drug treatment protocol to the subject 

in order to remove one or more second cells in the 
subject; 
introducing stem cells into a tissue and/or organ of the 
subject in order to rejuvenate one or more tissue cells 
in the tissue and/or one or more organ cells in the organ; 

carrying out a reinforcement step that includes one or 
more actions that prevent further senescence or degra- 
dation of the tissue or organ; or 

one or more actions that prevent further senescence or 

degradation of the tissue or organ is derived from the 
computational transcriptome analysis of the tissue or 
organ of the subject. 

8. The method of claim 7, further comprising: 

performing feature importance analysis for ranking genes 

or gene sets by their importance in age prediction; or 
correlating a gene expression level with the predicted 
biological age of the subject; 

identifying a subset of a genes or gene sets or biological 

pathways thereof that are selected as targets the thera- 
peutic regimen; 

correlating a biological signaling pathway signature with 

the predicted biological age of the subject. 

9. The method of claim 1, wherein the transcriptome 
signatures are based on signaling pathway activation signa- 
tures. 

10. The method of claim 1, after a defined time period, 

performing steps (a), (b), (c), (d), and (e) in a second 

iteration; and 

comparing the initial report with the report of the second 

iteration; and 

determining a change in the predicted biological age over 

the defined time period. 

11. The method of claim 1, further comprising: 

performing a therapeutic regimen over a defined time 

period, 

performing steps (a) (b), (c), (d), and (e) in a second 

iteration; and 

comparing the initial report with the report of the second 

iteration; 

determining a change in the predicted biological age over 

the defined time period; and 
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determining: 
whether the therapeutic regimen changed the predicted 

biological age, 

if the therapeutic regimen changed the predicted biologi- 
cal age, then determine whether or not to: continue 
therapeutic regimen, change therapeutic regimen, or 
stop therapeutic regimen; or 

if the therapeutic regimen does not change the predicted 
biological age, then determine whether or not to: con- 
tinue therapeutic regimen, change therapeutic regimen, 
or stop therapeutic regimen. 

12. The method of claim 1, further comprising performing 

one or more of: 

a therapeutic regimen based on the predicted biological 
age in view of an actual age of the subject; or 

a diet regimen based on the predicted biological age in 
view of an actual age of the subject. 

13. The method of claim 1, further comprising performing 

one or more of 

an actuarial assessment of the subject based on the 
predicted biological age; 

a risk assessment based the predicted biological age; 

an insurance assessment based on the predicted biological 
age. 

14. The method of claim 1, further comprising: 

(f) receiving a second transcriptome signature derived 
from a baseline, the second transcriptome being from a 
second organ or tissue of the subject or a second 
subject, the organ or tissue being the same or different 
from the second organ or tissue; and 

computing a difference between the signature of (a) and 
the signature of (f) to provide input vectors to the 
machine learning platform, wherein the machine learn- 
ing platform outputs classification vectors that com- 
prise components of the biological aging clock. 

15. The method of claim 14, wherein at least one of the 
transcriptome signatures is based on an in silico signaling 
pathway activation network decomposition. 

16. A computer program product comprising a tangible, 
non-transitory computer readable medium having a com- 
puter readable program code stored thereon, the code being 
executable by a processor to perform a method for biological 
aging clock for a patient, the method comprising: 

(a) receiving a transcriptome signature derived from a 

tissue or organ of the subject; 

(b) creating input vectors based on the transcriptome 
signature; 

(c) inputting the input vectors into a machine learning 
platform; 

(d) generating a predicted biological aging clock of the 
tissue or organ based on the input vectors by the 
machine learning platform, wherein the biological 
aging clock is specific to the tissue or organ; and 

(e) preparing a report that includes the biological aging 
clock that identifies a predicted biological age of the 
tissue or organ. 
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17. The computer program product of claim 16, the 
method further comprising: 
creating at least a second biological aging clock by 
repeating any one or more of steps (a), (b), (c), and/or 
(d), wherein the second biological aging clock is based 
on a transcriptome from the tissue or organ of the 
subject, a different tissue or organ of the subject or a 
tissue or organ of a second subject; and 
optionally, preparing a report that includes the second 
biological aging clock that identifies a second predicted 
biological age of the tissue or organ of the subject, a 
different tissue or organ of the subject or a tissue or 
organ of a second subject. 
18. The computer program product of claim 17, the 
method further comprising: 
combining the biological aging cock with the second 
biological aging clock to create a synthetic biological 
aging clock, wherein the synthetic biological aging 
clock provides a synthetic biological age of the tissue, 
organ, or of the subject; and 
optionally, preparing a report that includes the synthetic 
biological aging clock that identifies the synthetic bio- 
logical age of the tissue, organ, or of the subject. 
19. The computer program product of claim 16, the 
method further comprising: 
comparing the predicted biological age of the tissue or 
organ with the actual age of the subject; 
comparing the second predicted biological age of the 
tissue or organ with the actual age of the subject; 
comparing the synthetic biological age of the tissue or 
organ and with the actual age of the subject, 
wherein the method further comprises: 
preparing a report with the comparing and with a differ- 
ence from the actual age of the subject. 
20. The computer program product of claim 16, the 
method further comprising: 
performing feature importance analysis for ranking genes 
or gene sets by their importance in age prediction; 
correlating a gene expression level with the predicted 
biological age of the subject; 
identifying a subset of a genes or gene sets or biological 
pathways thereof that are selected as targets the thera- 
peutic regimen; or 
correlating a biological signaling pathway signature with 
the predicted biological age of the subject. 
21. Ihe computer program product of claim 16, the 
method further comprising: 
after a defined time period, 
performing steps (a), (b), (c), (d), and (e) in a second 
iteration; and 
comparing the initial report with the report of the second 
iteration; and 
determining a change in the predicted biological age over 
the defined time period. 
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